Methods and compositions for site-specific labeling of peptides and proteins

ABSTRACT

Methods and compositions are provided for covalently linking a chemical species to a recombinant or synthetic polypeptide. The methods involve the reaction of a thioester-comprising polypeptide with a reagent comprising a reactive amino-thiol group connected to the chemical species which is to be covalently linked to the polypeptide, via a linker. Such chemical species can be a functional group, a label or tag molecule, a biological molecule, a ligand, or a solid support. Efficient and catalyst-free methods for C-terminal protein labeling are also provided. The methods expand current capabilities in the area of protein functionalization, providing useful and complementary tools for the isolation, detection, characterization, and analysis of proteins in a variety of in vitro and in vivo applications.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a national stage application under 35 U.S.C. §371 of PCT Application No. PCT/US2013/058322, filed Sep. 5, 2013, which claims priority to and the benefit of U.S. provisional patent application Ser. No. 61/698,045 entitled “Methods and Reagents for Site-Specific Labeling of Peptides and Proteins,” filed Sep. 7, 2012, which is incorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under grant no. CHE-1112342 awarded by the National Science Foundation. The government has certain rights in the invention.

1. TECHNICAL FIELD

The present invention relates to methods and compositions for covalently linking a chemical species to a recombinant or synthetic polypeptide.

2. BACKGROUND OF THE INVENTION

Chemical methods for site-specific functionalization of proteins and peptides are useful in a variety of research and biomedical applications. For example, the site-specific attachment of a chromophore such as a fluorescent dye to a target protein can be useful to enable detection of such protein in a complex mixture or to track expression and localization of the target protein within a cell or living organism. On the other hand, site-specific functionalization of a protein with an affinity tag can be used to facilitate protein isolation, purification, and characterization. Site-specific functionalization can also be useful in the preparation of protein microarrays, which in turn can be useful for screening protein-ligand, protein-protein, antigen-antibody interactions. As another example, methods to chemically link a protein such as a therapeutic protein to a polymer (e.g., polyethylene glycol), a small-molecule drug, a cell receptor ligand, or another protein or peptide can be valuable to enhance and modulate the pharmacological, pharmacokinetic, or tissue-targeting properties of the therapeutic protein.

Several methods for the functionalization of peptides and proteins are known in the art (see, e.g., Hermanson 1996; Jing and Cornish 2011; Crivat and Taraska 2012). Conventional strategies have taken advantage of nucleophilic side-chain functionalities in certain amino acids (e.g., thiol group in cysteine, amino group in lysine) to couple a chemical species to the polypeptide via an electrophilic reagent (Hermanson 1996). An inherent limitation of these approaches is than more than one such amino acid can be present in the target polypeptide, preventing accurate control on the site-selectivity of the reaction. Furthermore, using these strategies, selective labeling of an individual protein in complex biological mixtures (e.g., cell lysate or within a cell) is not possible owing to the occurrence of numerous other proteins having similar reactive functionalities.

More recent approaches for protein labeling have involved the genetic fusion of a protein to a protein tag such as a fluorescent protein (e.g., green fluorescent protein and variants thereof) or an enzyme, which can be covalently modified via an irreversible inhibitor to indirectly link a certain chemical species (e.g., fluorophore or affinity label) to the protein of interest (Jing and Cornish 2011; Crivat and Taraska 2012). Examples of the latter include the so-called SNAP tag (Keppler, Gendreizig et al. 2003), HaloTag (Los, Encell et al. 2008), and the TMP-tag (Calloway, Choob et al. 2007). A common drawback of these approaches is however that permanent fusion of the target protein to a non-native protein tag may affect the biological function, dynamics, conformational properties, and/or cellular localization of the protein of interest.

Other approaches in the area of protein labeling have involved the use of short (e.g., 6-20 amino acid-long) peptide sequences which are genetically fused to the protein of interest and serve as recognition sites for enzyme-catalyzed posttranslational modifications. By action of these enzymes or engineered variants thereof and utilizing modified co-substrates, fluorophores or other small molecule labels have been attached to these peptide sequences, and thus, to the target protein. Examples of these strategies include the use of biotin ligase BirA (Chen, Howarth et al. 2005), sortase (Popp, Antos et al. 2007), lipoic acid ligase (Cohen, Zou et al. 2012), and phosphopantetheine transferase (PPTase) (Yin, Liu et al. 2004). Also in this case, however, the target protein must be permanently fused to a non-native peptide sequence, which can alter the properties of the former. In addition, the addition (or co-expression) of an auxiliary processing enzyme is required for both in vitro and in vivo applications.

In general, ‘traceless’ methods for protein labeling that involve no modifications or extensions of the primary sequence of the target protein are highly desirable in order to minimize the risks of altering its structure/function/cellular localization. In particular, the ability to site-specifically attach new chemical entities to the carboxy-terminus of a protein or enzyme is most valuable as the C-terminus is often solvent-exposed and typically not directly involved in binding or catalysis. Thus, efficient methods for C-terminal functionalization of a protein can be of great value toward protein labeling or immobilization under non-disruptive conditions.

Recently developed technologies have made possible the generation of recombinant proteins comprising a thioester group at their C-terminal end. The C-terminal thioester group provides a unique reactive chemical functionality within the protein which can be exploited for site-specific labeling of a target protein. Recombinant C-terminal thioester proteins can be generated by exploiting the mechanism of inteins, which are naturally occurring proteins capable of excising themselves from the internal region of a precursor polypeptide via a posttranslational process known as protein splicing (Paulus 2000). The first step in protein splicing involves an intein-catalyzed N→S (or N→O) acyl transfer in which the polypeptide chain flanking the intein N-terminus (N-extein) is transferred to the side-chain thiol or hydroxy group of a conserved cysteine, serine, or threonine residue at the N-terminus of the intein. Further intramolecular rearrangements follow that ultimately lead to the excision of the intein from the precursor polypeptide and the ligation of N-extein unit to the C-extein unit (=polypeptide chain flanking the intein C-terminus) via a peptide bond. By genetically fusing a protein of interest to the N-terminus of engineered intein variants which are unable to undergo C-terminal splicing (e.g., via mutation of the conserved asparagine residue at the intein C-terminus or removal of the C-extein unit), it is possible to promote only the first step of protein splicing, thereby producing a recombinant protein with a reactive C-terminal thioester linkage. The sequencing and characterization of several naturally occurring intein-comprising proteins show that inteins share a similar mechanism as well as a number of conserved primary sequence regions called ‘intein motifs’, whereas generally there are no specific sequence requirements for the N- and C-extein units. To date, more than 500 experimentally validated and putative intein sequences have been identified.

The ability to generate recombinant C-terminal thioester proteins via the genetic fusion of a protein to the N-terminus of a natural intein, or engineered (or synthetic or artificial) variant thereof, provides the opportunity to link a chemical entity to the protein C-terminus via nucleophilic substitution at the thioester group. A known methodology in this area involves the reaction between a recombinant C-terminal thioester protein with another polypeptide (i.e., a recombinant or synthetic peptide/protein) comprising an N-terminal cysteine. This procedure, also known as Expressed Protein Ligation (Muir, Sondhi et al. 1998), involves an intermolecular transthioesterification reaction followed by an intramolecular S→N acyl shift to give a native peptide bond between the two polypeptide chains. Similarly, cysteine-comprising reagents have been used for labeling/immobilization of recombinant C-terminal thioester proteins (Chattopadhaya, Abu Bakar et al. 2009). Alternatively, and also in the context of protein labeling/immobilization applications, recombinant C-terminal thioester proteins have been functionalized at the C-terminus via the use of hydrazine-, hydrazide-, or oxyamine-comprising chemical reagents, in which the hydrazine, hydrazide, or oxyamine group acts as the nucleophile to promote the C-terminal ligation of the protein of interest to a given chemical species (e.g., a fluorescent dye) (Cotton, U.S. Pat. No. 7,622,552; Raines et al. U.S. Pat. Appl. 2008/0020942).

Unfortunately, all the aforementioned methods for protein C-terminal labeling are characterized by slow reaction kinetics resulting in low labeling efficiencies, in particular at short reaction times. In addition, high concentrations of reagents (either the target C-terminal thioester protein, or the labeling reagent, or both) are typically required to achieve satisfactory yields of the desired protein functionalized product. Furthermore, thiol catalysts such as, for example, thiophenol, mercaptoethanol, or MESNA, are typically necessary to expedite and/or increase the yields of these protein functionalization procedures. As a result of these drawbacks, the utility of these methods for protein C-terminal labeling/immobilization remains limited. For example, these reactions conditions can be hardly attained at the intracellular level, severely limiting the scope of these methods in the context of in vivo protein labeling applications. Furthermore, fast protein labeling procedures are required to enable the detection and isolation of transient or short-lived protein species in the context of proteomic or cell biology studies. Finally, the limited stability of certain proteins may not be compatible with the need for high reagent or catalyst concentrations associated to these methods.

Citation or identification of any reference in Section 2, or in any other section of this application, shall not be considered an admission that such reference is available as prior art to the present invention.

3. SUMMARY OF THE INVENTION

Methods, kits and compositions are provided for covalently linking a chemical species to a recombinant or synthetic polypeptide. The methods involve the reaction of a thioester-comprising polypeptide with a reagent comprising a reactive amino-thiol group connected to the chemical species which is to be covalently linked to the polypeptide, via a linker. Such chemical species may be a functional group, a label or tag molecule, a biological molecule, a ligand, or a solid support.

Efficient and catalyst-free methods for C-terminal protein labeling are also provided. These methods expand current capabilities in the area of protein functionalization, providing useful and complementary tools for the isolation, detection, characterization, and analysis of proteins in a variety of in vitro and in vivo applications.

A method is provided for forming a covalent linkage between a polypeptide and a chemical species, the method comprising the steps of:

-   a. providing a polypeptide, wherein the polypeptide comprises a     thioester group and/or wherein the polypeptide is C-terminally fused     to an intein; -   b. providing a chemical reagent of formula (I), (II), (III), (IV),     (V), (VI), (VII) or (VIII):

-   -   or a salt of the chemical reagent, wherein:         -   i. R is a chemical species to be covalently linked to the             polypeptide,         -   ii. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R′ is independently H, alkyl, or             substituted alkyl,         -   iv. n is 2 or 3; and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)— group, where each R′ is independently             an H, an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group; and     -   c. allowing the polypeptide to react with the chemical reagent         so that a covalent linkage between the reagent and the         polypeptide is formed.

In one embodiment of the method, R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.

In another embodiment of the method,

-   -   R is a bioorthogonal functional group selected from the group         consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′,         —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine,         aziridine, dihydroazirine, and norbornadiene groups, and     -   each R′ is independently an H, an aliphatic, a substituted         aliphatic, an aryl, or a substituted aryl group.

In another embodiment of the method, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.

In another embodiment of the method, R is biotin, a biotin analogue, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15.

In another embodiment of the method, R is a poly(ethyleneglycol) molecule.

In another embodiment of the method, R is a resin or a nanoparticle

In another embodiment of the method, R is a functionalized surface.

In another embodiment of the method, the surface is a microarray.

In another embodiment of the method, the intein is a naturally occurring intein, an engineered variant of a naturally occurring intein, a fusion of the N-terminal and C-terminal fragments of a naturally occurring split intein, or a fusion of the N-terminal and C-terminal fragments of an artificial split intein.

In another embodiment of the method, the intein is a polypeptide of SEQ ID NO:1-76, or an engineered (or synthetic) variant thereof.

In another embodiment of the method:

-   -   the C-terminal terminal asparagine, aspartic acid, or glutamine         residue in the intein is mutated to an amino acid other than         asparagine, aspartic acid, or glutamine, or     -   the N-terminal serine is mutated to a cysteine residue and the         C-terminal asparagine, aspartic acid, or glutamine residue in         the intein is mutated to an amino acid other than asparagine,         aspartic acid, or glutamine.

In another embodiment of the method, the intein is C-terminally fused to a polypeptide affinity tag selected from the group consisting of polyhistidine tag, Avi-Tag, FLAG tag, Strep-tag II, c-myc tag, S-Tag, calmodulin-binding peptide, streptavidin-binding peptide, chitin-binding domain, glutathione S-transferase, and maltose-binding protein. These tags and their sequences are well known in the art.

In another embodiment of the method, the polypeptide C-terminally fused to the intein comprises one or a plurality of the features selected from the group consisting of: the residue at position 1 prior to the intein (hereinafter “intein-1” or “I-1”) being F, Y, A, T, W, N, R or Q; the residue at position 2 prior to the intein (hereinafter “intein-2” or “I-2”) being G, P, or S; and the residue at position 3 prior to the intein (hereinafter “intein-3” or “I-3”) being G or S.

In another embodiment of the method, the intein-fused polypeptide is inside a cell or associated with the exterior surface of a cell membrane. The polypeptide can be inside the cell, e.g., in the cytoplasm or in another intracellular compartment such as the nucleus, or on the surface of the cell, e.g. associated with the cell membrane on its interior or exterior surface.

In another embodiment of the method, the cell is a prokaryotic or eukaryotic cell.

In another embodiment of the method, the prokaryotic cell is E. coli.

In another embodiment of the method, the eukaryotic cell is a yeast cell, an insect cell, a worm cell, a fish cell or a mammalian cell.

In another embodiment of the method, R₁, X, Y, and Z are hydrogen atoms,

-   -   L is selected from the group consisting of —C(O)NR′—,         —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂CH₂—O)n-,     -   R′ is a hydrogen, alkyl or aryl group, and     -   n is an integer number from 1 to 15.

In another embodiment of the method, R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.

In another embodiment of the method, the reagent is:

-   -   a. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond;     -   b. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

A kit is provided for forming a covalent linkage between a polypeptide and a chemical species, the kit comprising:

-   -   a. at least one chemical reagent of formula (I), (II), (III),         (IV), (V), (VI), (VII), or (VIII), or a salt of the reagent; and     -   b. one or a plurality of containers, wherein at least one         container comprises a pre-selected or desired amount of at least         one of the chemical reagents of formula (I), (II), (III), (IV),         (V), (VI), (VII), or (VIII), or a salt of the reagent, wherein:         -   i. R is the chemical species which is to be covalently             linked to the polypeptide,         -   ii. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R′ is independently an H, an aliphatic,             a substituted aliphatic, an aryl, or a substituted aryl             group,         -   iv. n is 2 or 3, and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H,             an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In one embodiment of the kit, R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.

In another embodiment of the kit, R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, norbornadiene groups, wherein each R′ is independently H, aliphatic, substituted aliphatic, aryl, or substituted aryl group.

In another embodiment of the kit, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and a oxazine derivative.

In another embodiment of the kit, R is biotin, a biotin analogue, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15

In another embodiment of the kit, the at least one reagent comprises at least one compound selected from the group consisting of:

-   -   a. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond:     -   b. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

In another embodiment of the kit, the kit further comprises a functionalized solid support with which the functional group R reacts. Functionalized solid supports and surfaces with which functional groups R can react are well known in the art.

A kit is also provided for immobilizing a polypeptide to a surface, the kit comprising:

-   -   a. a chemical reagent of formula (Ib), (IIb), (IIIb), (IVb),         (Vb), (VIb), (VIIb), or (VIIIb):

and

-   -   b. one or a plurality of containers, wherein at least one         container comprises a surface to which a chemical reagent of         formula (Ib), (IIb), (IIIb), (IVb), (Vb), (VIb), (VIIb), or         (VIIIb) is covalently bound, and wherein:         -   i. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   ii. X, Y, W, and Z are hydrogen or non-hydrogen substituents             selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, and wherein each R′ is independently an H, an             aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group,         -   iii. n is 2 or 3, and         -   iv. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, _(C5-C24) aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H,             an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In one embodiment of the kit, the surface is a solid support.

In another embodiment of the kit, the solid support is a resin, a nanoparticle, or the surface of a microarray.

A compound (also referred to herein as a “reagent”, a “chemical reagent” or a “composition”) is provided having the formula (I), (II), (III), (IV), (V), (VI), (VII) or (VIII):

or a salt thereof, wherein:

-   -   i. R is a functional group, a label molecule, a tag molecule, an         affinity label molecule, a photoaffinity label, a dye, a         chromophore, a fluorescent molecule, a phosphorescent molecule,         a chemiluminescent molecule, an energy transfer agent, a         photocrosslinker molecule, a redox-active molecule, an isotopic         label molecule, a spin label molecule, a metal chelator, a         metal-comprising moiety, a heavy atom-comprising-moiety, a         radioactive moiety, a contrast agent molecule, a MRI contrast         agent, an isotopically labeled molecule, a PET agent, a         photocaged moiety, a photoisomerizable moiety, a chemically         cleavable group, a photocleavable group, an electron dense         group, a magnetic group, an amino acid, a polypeptide, an         antibody or antibody fragment, a carbohydrate, a monosaccharide,         a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a         siRNA, a polynucleotide, an antisense polynucleotide, a peptide         nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a         biotin analogue, a biomaterial, a polymer, a water-soluble         polymer, a polyethylene glycol derivative, a water-soluble         dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic         acid-, or receptor-binding molecule, a biologically active         molecule, a drug or drug candidate, a cytotoxic molecule, a         solid support, a surface, a resin, a nanoparticle, a quantum         dot, or any combination thereof,     -   ii. R₁ is hydrogen, a substituted or non-substituted aliphatic         group, or a substituted or non-substituted aryl group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R′ is independently H, alkyl, or             substituted alkyl,         -   iv. n is 2 or 3; and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)— group, where each R′ is independently             an H, an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In one embodiment of the compound, R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, and norbornadiene groups, and

-   -   each R′ is independently an H, an aliphatic, a substituted         aliphatic, an aryl, or a substituted aryl group.

In another embodiment of the compound, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.

In another embodiment of the compound, R is biotin, a biotin analogue, or a perfluorinated alkyl chain —CF₃—(CF₂)_(m)— where m=3-15.

In another embodiment of the compound, R is a poly(ethyleneglycol) molecule.

In another embodiment of the compound, R is a resin or a nanoparticle.

In another embodiment of the compound, R is a functionalized surface.

In another embodiment of the compound, R₁, X, Y, and Z are hydrogen atoms,

-   -   L is selected from the group consisting of —C(O)NR′—,         —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂CH₂—O)n-,     -   R′ is a hydrogen, alkyl or aryl group, and     -   n is an integer number from 1 to 15.

In another embodiment of the compound, R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.

In another embodiment of the compound, the compound has formula (I), wherein:

-   -   a. R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond;     -   b. R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

Methods for synthesizing the foregoing compounds are also provided.

4. BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is described herein with reference to the accompanying drawings, in which similar reference characters denote similar elements throughout the several views. It is to be understood that in some instances, various aspects of the invention may be shown exaggerated or enlarged to facilitate an understanding of the invention.

FIG. 1. Schematic representation of one embodiment of the invention illustrating the application of the methods described herein for C-terminal functionalization of an intein-fused target polypeptide via reagents of type (I)-(IV) or reagents of type (V)-(VIII).

FIG. 2. Synthetic route for the preparation of various reagents of general formula (I) comprising either a bioorthogonal oxyamino functional group (compounds 8, 9, 10A, 10B), a bioorthogonal azide functional group (compound 6) or a carboxylic acid group (compound 11) as the R group.

FIG. 3. Synthetic route for the preparation a reagent of general formula (II) comprising a carboxylic acid group as the R group.

FIG. 4. Synthetic route for the preparation a reagent of general formula (I) comprising a coumarin-based fluorescent probe molecule as the R group.

FIG. 5. Synthetic route for the preparation a reagent of general formula (I) comprising a biotin-based affinity tag molecule as the R group.

FIG. 6. Synthetic route for the preparation reagents of general formula (V).

FIG. 7A-C. Functionalization of the target intein-fusion protein CBD-3 with 1-amino-2-(mercaptomethyl)-aryl-based reagents 11 and 17. A) General scheme of the protein labeling reactions. B) Percentage of protein labeling at different time points in the presence of different concentrations of reagents 11 and 17 as measured by SDS-PAGE; C) MALDI-TOF MS spectra of the labeled protein products.

FIG. 8A-C. Fluorescent labeling of the target intein-fusion protein CBD-2 with coumarin-comprising reagent 23. A) General scheme of the protein labeling reaction. B) SDS-PAGE gel analysis of the reaction between CBD-2 and 23 after 1 hour (lane 1), 5 hours (lane 2), and 12 hours (lane 3). A protein MW marker is included. Left panel: Comassie-blue stained gel. Right panel: fluorescence visualization of the gel upon excitation with 365-nm light. C) MALDI-TOF MS spectra of the desired fluorescently labeled protein products.

FIG. 9A-C. Biotinylation of the target intein-fusion protein CBD-2 with biotin-comprising reagent 26. A) General scheme of the protein labeling reaction. B) SDS-PAGE gel analysis of the reaction between CBD-2 and 26 after 1 hour (lane 1), 5 hours (lane 2), and 12 hours (lane 3). C) MALDI-TOF MS spectrum of the desired biotinylated protein product.

FIG. 10. Percentage of protein labeling at different time points for the reaction between protein CBD-2 and reagents 23 and 26 as determined by SDS-PAGE gel densitometry.

FIG. 11A-C. Functionalization of target protein CBD-1 with oxyamino-comprising reagent 8. A) General scheme of the protein labeling reaction. B) Percentage of protein labeling over time as determined by SDS-PAGE gel densitometry. C) MALDI-TOF MS spectrum of the desired CBD-8 product.

FIG. 12A-C. Functionalization of target protein CBD-3 with oxyamino-comprising reagent 9. A) General scheme of the protein labeling reaction. B) Percentage of protein labeling at the 1-, 2-, 3-, 6-, 12-, and 24-hour time point in the presence of different concentration of 9 as determined by SDS-PAGE gel densitometry. C) MALDI-TOF MS spectrum of the reaction mixture, indicating the clean formation of the oxyamine-functionalized protein product, CBD-9.

FIG. 13A-C. Protein biotinylation with biotin-comprising reagent 26. A) General scheme of the protein labeling reaction. B) Percentage of protein labeling at the 1-, 2-, 3-, 6-, 12-, and 24-hour time point in the presence of different concentration of 26 as determined by SDS-PAGE gel densitometry. C) MALDI-TOF MS spectrum of the reaction mixture, indicating the clean formation of the desired biotinylated protein product, CBD-26.

FIG. 14. Protein labeling in cell lysate with reagents 26 and 9. MALDI-TOF MS spectra of cell lysates of CBD-3-expressing E. coli cells after incubation with reagent 26 or reagent 9 at 10 mM for 4 hours. The peaks corresponding to the desired functionalized protein products, CDB-26 and CBD-9, respectively, are indicated.

FIG. 15A-B. Protein labeling in living E. coli cells. A) General scheme of the protein labeling reaction. Briefly, E. coli cells expressing CBD-3 were incubated with compound 26, washed and then lysed. B) MALDI-TOF MS spectra of the cell lysates after the labeling procedure (at 5 and 10 mM reagent concentration), indicating the formation of the desired biotinylated protein product, CBD-26. The minor product (CBD-COOH) resulting from spontaneous hydrolysis of the intein-fusion protein is also indicated.

FIG. 16A-C. Affinity purification of in vivo biotinylated protein. A) Schematic representation of the affinity purification procedure for isolating the in vivo biotinylated protein CBD-26 with streptavidin-coated beads. B-C) MALDI-TOF spectra of the E. coli cell lysate after in vivo labeling of CBD-3 with compound 26 prior to (B) and after (C) the biotin-capturing procedure.

FIG. 17A-B. In vitro protein labeling with reagent N-(2-mercaptoethyl)-amino-aryl-based reagent. A) General scheme of the protein labeling reaction. B) Percentage of protein labeling at different time points for the reaction between CBD-2 and reagent 30 as determined by SDS-PAGE gel densitometry.

FIG. 18A-B. In vivo protein labeling with reagent N-(2-mercaptoethyl)-amino-aryl-based reagent. A) General scheme of the protein labeling reaction. Briefly, E. coli cells expressing CBD-3 were incubated with compound 34, then washed, and lysed. B) MALDI-TOF MS spectra of the cell lysates after the labeling procedure (34 at 10 mM), indicating the formation of the functionalized protein product, CBD-34. The minor product (CBD-COOH) resulting from spontaneous hydrolysis of the intein-fusion protein is also indicated.

5. DETAILED DESCRIPTION OF THE INVENTION

Methods, kits and compositions are provided for covalently linking a chemical species to a recombinant or synthetic polypeptide. The methods involve the reaction of a thioester-comprising polypeptide with a reagent comprising a reactive amino-thiol group connected to the chemical species which is to be covalently linked to the polypeptide, via a linker. Such chemical species may be, for example, a functional group, a label or tag molecule, a biological molecule, a ligand, or a solid support.

Efficient and catalyst-free methods for C-terminal protein labeling are also provided. These methods expand current capabilities in the area of protein functionalization, providing useful and complementary tools for the isolation, detection, characterization, and analysis of proteins in a variety of in vitro and in vivo applications.

For clarity of disclosure, and not by way of limitation, the detailed description of the invention is divided into the subsections set forth below.

5.1. Methods

Methods, kits and compositions (also referred to herein as “reagents”) for site-selective functionalization of proteins and peptides are provided. The site-selective functionalization methods provided herein overcome a number of problems associated with previous methods for site-selective functionalization of proteins and peptides and, generally, involve the reaction between a protein or peptide comprising a permanent or transiently formed thioester group at its C-terminus with a chemical reagent comprising a reactive amino-thiol group.

The methods and reagents provided herein can be applied to covalently link a polypeptide (i.e., a protein or a peptide) to another chemical entity, which may be a functional group, a label or tag molecule (e.g., a fluorescent dye, an affinity tag, or a isotopically labeled molecule), a biological molecule (e.g., a peptide, a protein, a carbohydrate, a nucleoside or nucleotide, or a lipid), a small molecule (e.g., a protein-, nucleic acid-, or receptor-binding ligand, a drug or drug candidate), or a solid support (e.g., a solid surface or a resin bead). The functionalization procedure can be carried out under mild reaction conditions, that is, in aqueous buffer, at pH ranging from 6.0 to 9.0, and at temperatures ranging from 4 to 40 degrees ° C. The possibility to perform this procedure under mild conditions minimizes the risks of denaturation or degradation of the target protein or peptide which is to be functionalized. The functionalization can be carried out in vitro, that is in a cell-free environment, or in vivo, that is with the target protein or peptide residing inside a cell or being covalently or non-covalently attached to the surface of a cell.

Accordingly, a method is provided for linking a chemical entity or species to the C-terminus of a target polypeptide, the method comprising the steps of:

-   -   a) providing a polypeptide comprising a permanent or transiently         formed thioester group at its C-terminus;     -   b) providing a chemical species of general formula I, II, III,         or IV:

-   -   -   or salts thereof wherein:         -   R is the chemical entity or species which is to be             covalently linked to the target polypeptide;         -   R₁ is hydrogen, aliphatic, substituted aliphatic, aryl or             substituted aryl group;         -   X, Y, and Z is hydrogen or a non-hydrogen substituent;         -   L is a linker group; and

    -   c) allowing the polypeptide to react with the chemical species         of general formula I, II, III, or IV so that a covalent linkage         between the reagent and the polypeptide is formed by virtue of a         nucleophilic substitution reaction at the level of the thioester         group.

In a specific embodiment, the method comprises the steps of:

-   a. providing a polypeptide, wherein the polypeptide comprises a     thioester group and/or wherein the polypeptide is C-terminally fused     to an intein; -   b. providing a chemical reagent of formula (I), (II), (III), (IV),     (V), (VI), (VII) or (VIII):

-   -   or a salt of the chemical reagent, wherein:         -   i. R is a chemical species to be covalently linked to the             polypeptide,         -   ii. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R′ is independently H, alkyl, or             substituted alkyl,         -   iv. n is 2 or 3; and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)— group, where each R′ is independently             an H, an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group; and     -   c. allowing the polypeptide to react with the chemical reagent         so that a covalent linkage between the reagent and the         polypeptide is formed.

In one embodiment of the method, R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.

In another embodiment of the method,

-   -   R is a bioorthogonal functional group selected from the group         consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′,         —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine,         aziridine, dihydroazirine, and norbornadiene groups, and     -   each R′ is independently an H, an aliphatic, a substituted         aliphatic, an aryl, or a substituted aryl group.

In another embodiment of the method, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.

In another embodiment of the method, R is biotin, a biotin analogue, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15.

In another embodiment of the method, R is a poly(ethyleneglycol) molecule.

In another embodiment of the method, R is a resin or a nanoparticle

In another embodiment of the method, R is a functionalized surface.

In another embodiment of the method, the surface is a microarray.

In another embodiment of the method, the intein is a naturally occurring intein, an engineered variant of a naturally occurring intein, a fusion of the N-terminal and C-terminal fragments of a naturally occurring split intein, or a fusion of the N-terminal and C-terminal fragments of an artificial split intein.

In another embodiment of the method, the intein is a polypeptide of SEQ ID NO:1-76, or an engineered (or synthetic) variant thereof.

In another embodiment of the method:

-   -   the C-terminal terminal asparagine, aspartic acid, or glutamine         residue in the intein is mutated to an amino acid other than         asparagine, aspartic acid, or glutamine, or     -   the N-terminal serine is mutated to a cysteine residue and the         C-terminal asparagine, aspartic acid, or glutamine residue in         the intein is mutated to an amino acid other than asparagine,         aspartic acid, or glutamine.

In another embodiment of the method, the intein is C-terminally fused to a polypeptide affinity tag selected from the group consisting of polyhistidine tag, Avi-Tag, FLAG tag, Strep-tag II, c-myc tag, S-Tag, calmodulin-binding peptide, streptavidin-binding peptide, chitin-binding domain, glutathione S-transferase, and maltose-binding protein. These tags and their sequences are well known in the art.

In another embodiment of the method, the polypeptide C-terminally fused to the intein comprises one or a plurality of the features selected from the group consisting of: the residue at position 1 prior to the intein (hereinafter “intein-1” or “I-1”) being F, Y, A, T, W, N, R or Q; the residue at position 2 prior to the intein (hereinafter “intein-2” or “I-2”) being G, P, or S; and the residue at position 3 prior to the intein (hereinafter “intein-3” or “I-3”) being G or S.

In another embodiment of the method, the intein-fused polypeptide is inside a cell or associated with the exterior surface of a cell membrane. The polypeptide can be inside the cell, e.g., in the cytoplasm or in another intracellular compartment such as the nucleus, or on the surface of the cell, e.g. associated with the cell membrane on its interior or exterior surface.

In another embodiment of the method, the cell is a prokaryotic or eukaryotic cell.

In another embodiment of the method, the prokaryotic cell is E. coli.

In another embodiment of the method, the eukaryotic cell is a yeast cell, an insect cell, a worm cell, a fish cell or a mammalian cell.

In another embodiment of the method, R₁, X, Y, and Z are hydrogen atoms,

-   -   L is selected from the group consisting of —C(O)NR′—,         —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂—CH₂—O)n-,     -   R′ is a hydrogen, alkyl or aryl group, and     -   n is an integer number from 1 to 15.

In another embodiment of the method, R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.

In another embodiment of the method, the reagent is:

-   -   a. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond;     -   b. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

A method is also provided for linking a chemical entity or species to the C-terminus of a target polypeptide, the method comprising the steps of:

-   -   a) providing a polypeptide comprising a permanent or transiently         formed thioester group at its C-terminus;     -   b) providing a chemical species of general formula V, VI, VII,         or VIII:

-   -   -   or salts thereof wherein:         -   R is the chemical entity or species which is to be             covalently linked to the target polypeptide;         -   n is 2 or 3;         -   X, Y, W, and Z is hydrogen or a non-hydrogen substituent;         -   L is a linker group; and

    -   c) allowing the polypeptide to react with the chemical species         of general formula V, VI, VII, or VIII so that a covalent         linkage between the reagent and the polypeptide is formed by         virtue of a nucleophilic substitution reaction at the level of         the thioester group.

The reactivity of the reagents of formula (I) through (VIII) toward functionalization of a thioester-comprising polypeptide is conferred by the amino-thiol moiety comprised in these compounds (i.e., the 1-amino-2-(mercaptomethyl)-aryl moiety in compounds (I)-(IV) and the N-(2-mercaptoethyl)-amino-aryl moiety in compounds (V)-(VIII)) as discovered by the inventors. These amino-thiol moieties are able to efficiently promote a nucleophilic substitution at the C-terminal thioester group, thereby forming a covalent linkage between the target polypeptide and the reagent, and thus the between target polypeptide and the chemical entity or species comprised in the reagent.

As described in FIG. 1, this reaction typically involves a thioesterification reaction by action of the thiol group in the reagents (I)-(VIII) to generate a stable thioester product (product ‘a’ in FIG. 1). This reaction product can then undergo an intramolecular S→N acyl transfer reaction to give a stable amide linkage between the reagent of formula (I)-(VIII) and the polypeptide which is to be functionalized (product ‘b’ in FIG. 1). For the purpose of protein/peptide functionalization, both the thioester product (product ‘a’) and the amide product (product ‘b’) are useful, albeit the latter is expected to generally exhibit greater stability against hydrolysis and thus, depending on the specific application of the methods provided herein, may be in some cases preferred.

The R₁ group in the reagents of formula (I), (II), (III), and (IV) can be hydrogen, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group. The nature of the R₁ group can affect the rate of the intramolecular S→N acyl transfer process after the transthioesterification reaction, that is the conversion of product ‘a’ into product ‘b’ in FIG. 1. In general, when the R₁ group is small (e.g., hydrogen atom, methyl or ethyl group) the formation of product ‘b’ is favored, whereas when the R₁ group is large (e.g., phenyl or benzyl group) the formation of product ‘a’ is favored. The choice of the R₁ group is thus made according to the specific applications of the methods provided herein and the preferred product (either product ‘a’ or product ‘b’) in each case. Preferably, the R₁ group is selected from the group consisting of hydrogen, methyl, ethyl, and propyl group. Most preferably, the R₁ group is hydrogen.

L is a linker or a linker group that provides a spacer function between the R group and the thioester-reactive amino-thiol moiety in reagents (I) through (VIII). In one embodiment, L is a linker or a linker group selected from the group consisting of aliphatic, substituted aliphatic, aryl, substituted aryl, heteroatom-comprising aliphatic, substituted heteroatom-comprising aliphatic, heteroatom-comprising aryl, substituted heteroatom-comprising aryl, alkoxy, aryloxy groups. In particular, Y is a linker group selected from the group consisting of C₁-C₂₄ alkyl, C₁-C₂₄ substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl, C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄ substituted aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄ alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—, —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1, 2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—, —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═, —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and —C(R′)₂—N(R′)—N(R′)— group, where each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group.

In some embodiments, L is an amino acid such as, for example, the α-amino acid glycine. In other embodiments, L is a polymer such as poly(ethyleneglycol). In still other embodiments, L is a polyether of formula —(CH₂—CH₂—O)_(n)—, where n in an integer number between 1 and 15.

The X, Y, W, and Z groups in the compounds of formula (I) through (VIII) can be hydrogen atoms or non-hydrogen substituents selected from the group consisting of alkyl, heteroatom-comprising alkyl, alkenyl, heteroatom-comprising alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl, heteroatom-comprising aryl, alkoxy, heteroatom-comprising alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, hydroxyl (—OH), ether (—OR′), thioether (—SR), carboxy (—COOH), ester (—COOR′), amide (—CONR′₂), amino (—NR′₂), nitro (—NO₂), sulfo (—SO₂—OH), sulfono (—SO₂—OR′), sufonamide (—SO₂NR₂′), cyano (—C≡N), cyanato (—O—C≡N), thiocyanato (—S—C≡N), phosphono (—P(O)(OR′)₂), phosphate (—O—P(O)(OR′)₂) group, where each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group. In addition, any of the non-hydrogen substituent X, Y, W, and Z can be connected to one or more of the substituents to form a ring structure. For example, the substituent in X in compound of formula (III) can be connected to either Y or Z or both to form a ring structure. Non-limiting examples of ring structures include, for example, furan, thiophene, pyrrole, pyrroline, pyrrolidine, dioxolane, oxazole, thiazole, imidazole, imidazoline, imidazolidine, pyrazole, pyrazoline, pyrazolidine, isoxazole, isothiazole, oxadiazole, triazole, thiadiazole, pyran, pyridine, piperidine, dioxane, morpholine, dithiane, thiomorpholine, pyridazine, pyrimidine, pyrazine, piperazine, triazine, trithiane, indolizine, indole, isoindole, indoline, benzofuran, benzothiophene, indazole, benzimidazole, benzthiazole, purine, quinolizine, quinoline, isoquinoline, cinnoline, phthalazine, quinazoline, quinoxaline, naphthyridine, pteridine, quinuclidine, carbazole, acridine, phenazine, phenthiazine, phenoxazine, phenyl, indene, naphthalene, azulene, fluorene, anthracene, and phenanthracene groups.

The use of non-hydrogen substituents as X, Y, W, or Z group can be useful to modulate the physico-chemical properties of the reagents (I)-(VIII), such as, for example, their water-solubility or cell permeability. At the same time, the replacement of these groups with sterically bulky substituents can affect the reactivity of the reagents toward functionalization of the target thioester-comprising polypeptide, in particular when the substituent is most proximal (i.e., in ortho position) to the thiol-comprising substituent (i.e., the methanethiol group in compounds (I)-(IV); the aminoalkylthiol group in compounds (V)-(VIII)). Accordingly, it is generally preferable that either none, one, or at most two groups among the X, Y, W, or Z groups are non-hydrogen substituents. In particular, it is generally preferred that the position in ortho to the thiol-comprising substituent is occupied by a hydrogen atom (e.g., X═H in compounds of general formula (I), (II), and (III)).

With respect to the linker or linker group L comprised in the reagents of general formula (I) through (VIII), the L group is chosen so that, preferably, none of the substituents or functional groups comprised within this group can react with a thiol or amino group, or any of the functional groups comprised in the R group. Similarly, when any of the X, Y, W, or Z groups is a non-hydrogen substituent, the X, Y, W, or Z groups are chosen so that, preferably, none of these groups or functional groups comprised within these groups can react with a thiol or amino group, or any of the functional groups comprised in the R group. Those of ordinary skill in the art can select suitable linkers or linker groups L that meet these requirements based on general knowledge in the art. Accordingly, the L, X, Y, W, and Z group preferably do not comprise thiol groups, selenol groups, thioester groups, aldehyde or ketone groups, α,β-unsaturated acid, α,β-unsaturated amide, or α,β-unsaturated ester groups, α-halo-acid, α-halo-amide, or α-halo-ester groups, unless these groups are protected with suitable protecting groups which make them unreactive under the conditions applied in the methods provided herein. A large amount of information is known in the art concerning the use of protecting groups and one of ordinary skills in the art will be capable of selecting appropriate protecting groups for a given application.

The R group can be any chemical entity or species that is to be covalently linked to the target thioester-comprising polypeptide. Accordingly, in one embodiment, the R group is a selected from the group consisting of a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule a protein-, nucleic acid-, or receptor-binding ligand, a drug or drug candidate), or a solid support (e.g., a solid surface or a resin bead).

In some embodiments, the R group in reagents (I) through (VIII) is a functional group. In some specific embodiments, the R group is a bioorthogonal functional group. Several bioorthogonal functional groups are known in the art and these include, but are not limited to, hydrazino —(NHNH₂), hydrazido (—C(O)NHNH₂), oxyamino (—ONH₂), azido (—N₃), alkynyl (—C≡CR′), alkenyl (—CR′═CR′₂), phosphine (—PR₂), 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, norbornadiene, boronaryl (Ar—B(OH)₂), bromoaryl (Ar—Br), iodoaryl (Ar—I) groups, where R′ is a hydrogen, alkyl or aryl group and Ar is an aryl group. In specific embodiments, the R group is a hydrazino (—NR′NR′₂), hydrazido (—C(O)NR′NR′₂), oxyamino group (—ONH₂), azido (—N₃), alkynyl (—C≡CR′), alkenyl (—CR′═CR′₂), phosphine (—PR′₂), 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, norbornadiene groups, where each R′ is independently H, aliphatic, substituted aliphatic, aryl, or substituted aryl group.

When R is a bioorthogonal functional group, such functional group can be used to further couple the functionalized polypeptide to another chemical entity according to methods known in the art. For example, an alkynyl group (—C≡CR′) and azido (—N₃) group can be engaged in a bioorthogonal bond-forming reaction (i.e., Huisgen 1,3-dipolar cycloaddition) via the addition of Cu(I) as catalyst or using a strained alkyne (e.g., cyclooctyne). A bioorthogonal Staudinger ligation can be carried out between a phosphine (—PR′₂) and an azido group. A tetrazole and an alkenyl group (—CR′═CR′₂) can be engaged in a bioorthogonal bond-forming reaction (‘photoclick’ cycloaddition) upon irradiation with 290-350 nm light.

In some embodiments, the R group in reagents (I) through (VIII) is a fluorescent molecule. In some specific embodiments, the R group is a fluorescent molecule selected from the group consisting of a coumarin derivative (e.g., Alexa™ dyes), a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives (e.g., CyDyes), a phthalocyanine derivative, and a oxazine derivative (e.g., resorufin).

In some embodiments, the R group in reagents (I) through (VIII) is an affinity label molecule. In some specific embodiments, the R group is biotin or a biotin analogue.

In some embodiments, the R group in reagents (I) through (VIII) is a polymer. In some specific embodiments, the R group is selected from the group consisting of a functionalized or non-functionalized linear poly(ethyleneglycol) molecule, and a functionalized or non-functionalized branched poly(ethyleneglycol) molecule. In some embodiments, the R group is a polyether of formula —(CH₂—CH₂—O)_(n)—, where n in an integer number between 10 and 1000.

In some embodiments, the R group in reagents (I) through (VIII) is a water-soluble polymer. Such water-soluble polymers include, but are not limited to, polyethylene glycol, polyethylene glycol propionaldehyde, mono C₁-C₁₀ alkoxy or aryloxy derivatives thereof monomethoxy-polyethylene glycol, polyvinyl pyrrolidone, polyvinyl alcohol, polyamino acids, divinylether maleic anhydride, N-(2-Hydroxypropyl)-methacrylamide, dextran, dextran derivatives including dextran sulfate, polypropylene glycol, polypropylene oxide/ethylene oxide copolymer, polyoxyethylated polyol, heparin, heparin fragments, polysaccharides, oligosaccharides, glycans, cellulose and cellulose derivatives, including but not limited to methylcellulose and carboxymethyl cellulose, serum albumin, starch and starch derivatives, polypeptides, polyalkylene glycol and derivatives thereof, copolymers of polyalkylene glycols and derivatives thereof, polyvinyl ethyl ethers, and alpha-beta-poly[(2-hydroxyethyl)-DL-aspartamide, and the like, or mixtures thereof.

In other embodiments, the R group in reagents (I) through (VIII) is a solid support. Accordingly, the methods provided herein can be applied to immobilize a target polypeptide onto a solid support. Because the functionalization procedure occurs site-specifically at the C-terminus of the target polypeptide, the orientation of the target polypeptide immobilized onto the solid support can be predicted and controlled. Such control of the orientation of the polypeptide attachment to the solid support can be useful, for example, in the evaluation of the biophysical properties of the polypeptide (e.g., via surface plasmon resonance, enzyme-linked immunoassay, and the like), for the construction of protein (micro)arrays, for the preparation of affinity chromatographic media, and related applications.

Examples of solid supports well known in the art that can be used include, but are not limited to, solid and semisolid matrixes, such as aerogels and hydrogels, resins, beads, biochips (including thin film coated biochips), microfluidic chip, a silicon chip, multi-well plates (also referred to as microtitre plates or microplates), membranes, cells, conducting and nonconducting metals, glass (including microscope slides) and magnetic supports. Other non-limiting examples of solid supports used in the methods and compositions described herein include silica gels, polymeric membranes, particles, derivatized plastic films, derivatized glass, derivatized silica, glass beads, cotton, plastic beads, alumina gels, polysaccharides such as Sepharose, poly(acrylate), polystyrene, poly(acrylamide), polyol, agarose, agar, cellulose, dextran, starch, FICOLL, heparin, glycogen, amylopectin, mannan, inulin, nitrocellulose, diazocellulose, polyvinylchloride, polypropylene, polyethylene (including poly(ethylene glycol)), nylon, latex bead, magnetic bead, paramagnetic bead, superparamagnetic bead, starch and the like. In certain embodiments, the supports used in the methods and compositions described herein are supports used for surface analysis such as surface acoustic wave devices or devices utilizing evanescent wave analysis, such as surface plasmon resonance analysis. Other supports used in the methods and compositions described herein include, but are not limited to, resins used in peptide synthesis such as, by way of example only, polystyrene, PAM-resin, POLYHIPE™ resin, polyamide resin, polystyrene resin grafted with poly(ethylene glycol), polydimethyl-acrylamide resin and PEGA beads. The solid support can be, but is not limited to, in the form of a sheet, a multi-well plate, a bead or microbead, a slide, a microarray tray, and a test tube. Other suitable shapes and configurations for the solid support will also be recognized by the skilled artisan.

In certain embodiment, the surfaces of the solid supports can have reactive functional groups, which can be used to covalently or non-covalently link a reagent of formula (I) through (VIII) to the solid support. Such functional groups can include, but are not limited to, hydroxyl, carboxyl, halogen, nitro, cyano, amido, urea, carbonate, carbamate, isocyanate, sulfone, sulfonate, sulfonamide and sulfoxide groups. In other embodiments, the surfaces of the solid supports are covalently or non-covalently coated to streptavidin or avidin. In this case, reagents (I) through (VIII) comprising a biotin or biotin analogue within the R group can be linked to the solid support via a tight biotin-(strept)avidin non-covalent interaction.

In specific embodiments, the target polypeptide comprises one or more thioester groups. In preferred embodiments, the target polypeptide comprises a single thioester group. In most preferred embodiments, the target polypeptide comprises a single, C-terminal thioester group.

The thioester-comprising polypeptide may be synthetically or recombinantly produced. Several methods are known in the art to produce synthetic thioester-comprising polypeptides. For example, synthetic thioester-comprising peptides may be produced via solid-phase peptide synthesis (SPPS) using BOC chemistry and suitable resins for generating a C-terminal thioester upon cleavage of the polypeptide chain from the resin (Hojo et al., Bull. Chem. Soc. Jpn. 1993, 66, 2700-06). Alternatively, safety-catch linker resins can be used in combination with Fmoc-based SPPS to generate synthetic thioester-comprising peptides (Shin, Winans et al. 1999).

In preferred embodiments, the target polypeptide is a recombinant polypeptide. In most preferred embodiments, the target polypeptide which is to be functionalized is genetically fused to the N-terminus of an intein so that a thioester group is transiently formed at the junction between the target polypeptide and the intein via intein-catalyzed N,S acyl transfer as described above.

Accordingly, a method is also provided for linking a chemical entity or species to a recombinant polypeptide, the method comprising the steps:

-   -   d) providing a precursor polypeptide, the precursor polypeptide         comprising the target polypeptide fused to the N-terminus of an         intein;     -   e) providing a chemical reagent of general formula (I), (II),         (III), (IV), (V), (VI), (VII), or (VIII) as described above;     -   f) allowing the precursor polypeptide to react with the chemical         reagent so that a covalent linkage between the chemical reagent         and the target polypeptide is formed with concomitant release of         the intein.

In certain embodiments of the method, the intein to be fused to the C-terminus of the target polypeptide can be a naturally occurring intein, an engineered variant of a naturally occurring intein, a fusion of the N-terminal and C-terminal fragments of a naturally occurring split intein and a fusion of the N-terminal and C-terminal fragments of an artificial split intein.

Nucleotide sequences encoding for intein domains that can be used for preparing the biosynthetic precursors and self-processing biosynthetic precursors within the invention can be derived from naturally occurring inteins and engineered variants thereof. A rather comprehensive list of such inteins is provided by the Intein Registry (http://www.neb.com/neb/inteins.html). Inteins that can be used can include, but are not limited to, any of the naturally occurring inteins from organisms belonging to the Eucarya, Eubacteria, and Archea. Among these, inteins of the GyrA group (e.g., Mxe GyrA, Mfl GyrA, Mgo GyrA, Mkas GyrA, Mle-TN GyrA, Mma GyrA), DnaB group (e.g., Ssp DnaB, Mtu-CDC1551 DnaB, Mtu-H37Rv DnaB, Rma DnaB), RecA group (e.g., Mtu-H37Rv RecA, Mtu-So93 RecA), RIR1 group (e.g., Mth RIR1, Chy RIR1, Pfu RIR1-2, Ter RIR1-2, Pab RIR1-3), and Vma group (e.g., Sce Vma, Ctr Vma) are preferred and intein Mxe GyrA (SEQ ID NO:1) and the engineered ‘mini Ssp DnaB (‘eDnaB’, SEQ ID NO:2) are particularly preferred.

In particular, natural inteins whose self-splicing mechanism has been confirmed experimentally can be used within the invention. These include, but are not limited to, Mxe GyrA (SEQ ID NO:1), Ssp eDnaB (SEQ ID NO:2), Hsp-NRC1 CDC21 (SEQ ID NO:3), Ceu ClpP (SEQ ID NO:4), Tag Pol-1 (SEQ ID NO:5), Tfu Pol-1 (SEQ ID NO:6), Tko Pol-1 (SEQ ID NO:7), Psp-GBD Pol (SEQ ID NO:8), Tag Pol-2 (SEQ ID NO:9), Thy Pol-1 (SEQ ID NO:10), Tko Pol-2 (SEQ ID NO:11), Tli Pol-1 (SEQ ID NO:12), Tma Pol (SEQ ID NO:13), Tsp-GE8 Pol-1 (SEQ ID NO:14), Tthi Pol (SEQ ID NO:15), Tag Pol-3 (SEQ ID NO:16), Tfu Pol-2 (SEQ ID NO:17), Thy Pol-2 (SEQ ID NO:18), Tli Pol-2 (SEQ ID NO:19), Tsp-GE8 Pol-2 (SEQ ID NO:20), Pab Pol-II (SEQ ID NO:21), Mtu-CDC1551 DnaB (SEQ ID NO:22), Mtu-H37Rv DnaB (SEQ ID NO:23), Rma DnaB (SEQ ID NO:24), Ter DnaE-1 (SEQ ID NO:25), Ssp GyrB (SEQ ID NO:26), Mfl GyrA (SEQ ID NO:27), Mgo GyrA (SEQ ID NO:28), Mkas GyrA (SEQ ID NO:29), Mle-TN GyrA (SEQ ID NO:30), Mma GyrA (SEQ ID NO:31), Ssp DnaX (SEQ ID NO:32), Pab Lon (SEQ ID NO:33), Mja PEP (SEQ ID NO:34), Afu-FRR0163 PRP8 (SEQ ID NO:35), Ani-FGSCA4 PRP8 (SEQ ID NO:36), Cne-A PRP8 (SEQ ID NO:37), Hca PRP8 (SEQ ID NO:38), Pch PRP8 (SEQ ID NO:39), Pex PRP8 (SEQ ID NO:40), Pvu PRP8 (SEQ ID NO:41), Mtu-H37Rv RecA (SEQ ID NO:42), Mtu-So93 RecA (SEQ ID NO:43), Mfl RecA (SEQ ID NO:44), Mle-TN RecA (SEQ ID NO:45), Nsp-PCC7120 RIR1 (SEQ ID NO:76), Ter RIR1-1 (SEQ ID NO:46), Pab RIR1-1 (SEQ ID NO:47), Pfu RIR1-1 (SEQ ID NO:48), Chy RIR1 (SEQ ID NO:49), Mth RIR1 (SEQ ID NO:50), Pab RIR1-3 (SEQ ID NO:51), Pfu RIR1-2 (SEQ ID NO:52), Ter RIR1-2 (SEQ ID NO:53), Ter RIR1-4 (SEQ ID NO:54), CIV RIR1 (SEQ ID NO:55), Ctr VMA (SEQ ID NO:56), Sce VMA (SEQ ID NO:57), Tac-ATCC25905 VMA (SEQ ID NO:58), Ssp DnaB (SEQ ID NO:59).

Putative (‘theoretical’) inteins can also be used within the invention, provided they are able to catalyze the required N,S acyl transfer reaction. This property can be established experimentally based on the ability of intein-fused polypeptides to splice in the presence of thiophenol or other thiols. These putative inteins include, but are not limited to, Gth DnaB (GenBank accession number 078411), Ppu DnaB (GenBank accession number P51333), Mfl RecA (GenBank accession number not given), Mle DnaB (GenBank accession number CAA17948.1), Mja KIbA (GenBank accession number Q58191), Pfu KIbA (PF_949263 in UMBI), Pfu IF2 (PF_1088001 in UMBI), Pho Lon (GenBank accession number Baa29538.1), Mja r-Gyr (GenBank accession number G64488), Pho RFC (GenBank accession number F71231), Pab RFC-2 (GenBank accession number C75198), Mja RtcB (GenBank accession number Q58095), Pho VMA (NT01PH1971 in Tigr), AP-APSE1 dpol (AAF03988.1 in NCBI), Bde-JEL197 RPB2 (ABC17934 in NCBI), CbP-C-St RNR (BAE47774 in NCBI), CCy Hyp1-Csp-1 (EAZ88681.1 in NCBI), CCy Hyp1-Csp-2 (ACB52109.1 in NCBI), Cne-AD PRP8 (AAX39419 in NCBI), Cth-ATCC27405 TerA (ACG65137.1 in NCBI), Ctr ThrRS (CZ284364 in NCBI), Dhan GLT1 (AAW82371.1 in NCBI), Dra Snf2 (7471820 in NCBI), Hwa MCM-3 (YP_003131067 in NCBI), Hwa PolB-1 (CAJ51833 in NCBI), Mca MupF (NP_852755 in NCBI0, Mja Klba (Q58191 in NCBI), Mja PEP (ZP_00175589 in NCBI), Mja RFC-1 (YP_659332 in NCBI), Mja RFC-3 (ABR56888.1 in NCBI), Mja RNR-1 (ACI21751.1 in NCBI), Mja RNR-2 (H64403 in NCBI), Mja rPol A″ (CAJ53490 in NCBI), Mja UDP GD (ZP_01799256.1 in NCBI), MP-Be gp51 (AAR89772 in NCBI), Mtu SufB (NP_855148.1 in NCBI), Npu GyrB (ZP_01622715.1 in NCBI), Pfu RIR1-2 (ABM31270 in NCBI), Pho CDC21-2 (YP_137231 in NCBI), Pho CDC21-2 (CAJ53749.1 in NCBI), Pho LHR (ZP_06213967.1 in NCBI), Pho Pol-II (YP_001403293.1 in NCBI), Pho RadA (YP_288864 in NCBI), PI-PKoI (YP_003246437.1 in NCBI), Pko Pol-1 (ZP_06214852.1 in NCBI), Psy Fha (AAY90835 in NCBI), ShP-Sfv-5 Primase (ABY49883.1 in NCBI), Ssp DnaX (ZP_03271562.1 in NCBI), Ter DnaE-1 (YP_002730690.1 in NCBI), Ter DnaE-2 (YP_002616796 in NCBI), Ter RIR1-4 (ZP_03765843.1 in NCBI), and Tth-HB8-2 DnaE (TIGR contig:4743).

In other variations, intein sequences that can be used within the invention can be derived by fusing together the N-fragment and C-fragment of a naturally occurring split intein. Split inteins include, but are not limited to, Ssp DnaE (SEQ ID NO:60-SEQ ID NO:61), Neq Pol (SEQ ID NO:62-SEQ ID NO:63), Asp DnaE (SEQ ID NO:64-SEQ ID NO:65), Npu-PCC73102 DnaE (SEQ ID NO:66-SEQ ID NO:67), Nsp-PCC7120 DnaE (SEQ ID NO:68-SEQ ID NO:69), Oli DnaE (SEQ ID NO:70-SEQ ID NO:71), Ssp-PCC7002 DnaE (SEQ ID NO:72-SEQ ID NO:73), Tvu DnaE (SEQ ID NO:74-SEQ ID NO:75).

In preferred embodiments, the intein fused to the C-terminus of the target polypeptide is an engineered variant of a natural intein, which has been modified so that the ability of the intein to undergo C-terminal splicing is minimized or prevented. According to strategies well known in the art, this can be achieved, for example, by using an intein comprising no C-extein unit, or by removing the C-terminal amino acid in the intein (most typically, an asparagine or histidine residue), or by mutating the latter to an unreactive amino acid residue (e.g., via substitution to an alanine or glycine). Examples of the latter approach are provided in Section 6, Examples, below.

In the precursor polypeptide, the nature of the amino acids residues preceding the intein can affect the extent of premature hydrolysis during protein expression as well as the efficiency by which the reagents of formula (I) through (VIII) undergo ligation to the C-terminus of the target polypeptide. In particular, the inventors found that the last three C-terminal amino acid residues preceding the intein in the precursor polypeptide can affect the ligation efficiency, whereas the last residue preceding the intein can also affect the extent of premature hydrolysis of the precursor polypeptide during protein expression. These amino acid residues are here referred to as “I-1”, “I-2”, and “I-3” to indicate, respectively, the last, penultimate and antepenultimate amino acid residue of the target polypeptide prior to the intein protein in the primary sequence of the precursor polypeptide. For example, it was found that when the intein is Mxe GyrA intein (SEQ ID NO:1), most efficient functionalization of the target polypeptide was achieved with the I-1 amino acid residue being F, Y, A, T, W, N, R or Q, the I-2 amino acid residue being G, P, or S, and the I-3 amino acid residue being G or S. It is expected that different structure-reactivity trends may be observed in the case of other inteins. In these case, studies such as those described in (Frost, Vitali et al. 2013) can be carried out to identify optimal C-terminal amino acid residues for maximizing the efficiency of ligation of reagents (I)-(VIII) to a target polypeptide.

Accordingly, in specific embodiments, the precursor polypeptide consists of Mxe GyrA intein (SEQ ID NO:1), or an engineered variant thereof, fused to the C-terminus of a target polypeptide comprising one or more of the features selected from: I-1 is F, Y, A, T, W, N, R or Q; I-2 is G, P, or S; I-3 is G or S.

In some embodiments, a genetically encoded affinity tag is fused to the C-terminus of the intein. In this way, the precursor target polypeptide-intein fusion protein can be readily isolated after recombinant expression using affinity chromatography. This procedure can also facilitate the isolation of the desired functionalized polypeptide product via, for example, first immobilizing the polypeptide-intein fusion protein onto a solid support (e.g., affinity resin bead), and then contacting the immobilized protein to the reagents of formula (I) through (VIII) so that, upon functionalization, the functionalized polypeptide is released in the solution and the intein remains bound to the solid support.

In some embodiments, an affinity tag is linked to the N-terminus of the target polypeptide. In this way, the target thioester-comprising polypeptide or the precursor target polypeptide-intein fusion protein can be readily purified using affinity chromatography. This procedure can also facilitate the isolation of the functionalized target polypeptide via, for example, immobilizing the precursor polypeptide-intein fusion protein onto a solid support (e.g., affinity resin bead), and contacting the immobilized protein to the reagents of formula I-IV, so that, upon functionalization, the intein is released in the solution and the functionalized polypeptide remains bound to the solid support. After washing of the solid support, the functionalized polypeptide can then be recovered by competitive elution or by changing the buffer composition (e.g., changing pH).

Several affinity tags are known in the art, which can be used for the specific applications described above. Examples of these affinity tags include, but are not limited to, a polyhistidine tag (e.g., HHHHHH) (SEQ ID NO:77), an Avi-Tag (SGLNDIFEAQKIEWHELEL) (SEQ ID NO:78), a FLAG tag (DYKDDDDK) (SEQ ID NO:79), a Strep-tag II (WSHPQFEK) (SEQ ID NO:80), a c-myc tag (EQKLISEEDL) (SEQ ID NO:81), a S-Tag (KETAAAKFERQHMDS) (SEQ ID NO:82), a calmodulin-binding peptide (KRRWKKNFIAVSAANRFKKI-SSSGAL) (SEQ ID NO:83), a streptavidin-binding peptide (MDEKTTGWRGGHVVEGLAGELEQLRARL-EHHPQGQREP) (SEQ ID NO:84), a chitin-binding domain (CBD), a glutathione S-transferase (GST), and a maltose-binding protein (MBP).

In addition to direct fusion of the target polypeptide to the N-terminus of an intein as described above, a target thioester-comprising polypeptide may be produced in certain embodiments by reacting a precursor polypeptide (i.e., an intein-fused target polypeptide) with a thiol, such as, for example, thiophenol, benzyl mercaptan, sodium 2-mercaptoethane sulfonate (MESNA), beta-mercaptethanol, dithiothreitol (DTT), and the like. This reaction results in the formation of a C-terminal thioester polypeptide (with concomitant release of the intein) which can be then functionalized at the C-terminus according to the methods as described above.

In another embodiment, a recombinant intein-fused target polypeptide can be produced by introducing a polynucleotide encoding for the polypeptide construct into an expression vector, introducing the resulting vectors into an expression host, and inducing the expression of the encoded polypeptide. Numerous methods for making nucleic acids encoding peptides of a known or random sequence are known to a person skilled in the art. For example, polynucleotides having a predetermined sequence can be prepared chemically by solid phase synthesis using commercially available equipments and reagents. Polynucleotides can then be amplified using a polymerase chain reaction, digested via endonucleases, and ligated together according to standard molecular biology protocols known in the art (e.g., Sambrook et al. Molecular Cloning: A Laboratory Manual (Third Edition), Cold Spring Harbor Press, 2001). Suitable vectors for protein expression include chromosomal, nonchromosomal and synthetic DNA sequences, e.g., derivatives of SV40; bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, pseudorabies, adenovirus, adeno-associated viruses, retroviruses and many others. Any vector that transduces genetic material into a cell, and, if replication is desired, which is replicable and viable in the relevant host can be used. A large number of expression vectors and expression hosts are known in the art, and many of these are commercially available. Expression hosts that may be used for the preparation of the precursor polypeptide within the invention include any system that supports the transcription, translation, and/or replication of a nucleic acid. These systems include prokaryotes such as bacteria (e.g., Escherichia coli) and eukaryotes such as yeast, insect, and mammalian cells. These systems also include lysates of prokaryotic cells (e.g., bacterial cells) and lysates of eukaryotic cells (e.g., yeast, insect, or mammalian cells). These systems also include in vitro transcription/translation systems, many of which are commercially available. The choice of the expression vector and host system depends on the type of application intended for the methods provided herein and a person skilled in the art will be able to select a suitable expression host based on known features and application of the different expression hosts.

As demonstrated herein, the functionalization methods provided herein can be used for the site-specific functionalization of a target polypeptide in vitro, in a complex biologically-derived medium (e.g., cell lysate), or in the context of a cell (e.g., in a cell (for example, in the cytoplasm or another cellular compartment) or on a cell (for example, associated with the exterior surface of a cell membrane)).

In the context of a cell, a thioester-comprising polypeptide can be generated by recombinantly expressing the target polypeptide as fused to the N-terminus of a natural intein, or engineered variant thereof, so that a thioester group is transiently formed at the junction between the polypeptide and the intein by intein-catalyzed N,S acyl transfer as described above. The resulting precursor polypeptide can be soluble (i.e., not membrane-bound), covalently bound to a membrane of the cell, or non-covalently associated to a membrane of the cell.

Accordingly, in some embodiments, the precursor polypeptide that is to be targeted for functionalization using the methods provided herein is in a cell. In this case, the functionalization procedure involves (i) exposing the cell to one of the reagents of formula (I), (II), (III), (IV), (V), (VI), (VII), or (VIII), and (ii) allowing the precursor polypeptide to react with the chemical reagent so that a covalent linkage between the chemical reagent and the target polypeptide is formed with concomitant release of the intein. Virtually any cells, prokaryotic or eukaryotic, which can be transformed with heterologous DNA or RNA to direct the expression of a precursor polypeptide consisting of a target polypeptide C-terminally fused to an intein, and which can be grown in culture, may be used within the scope of the invention. Accordingly, in one embodiment, the cell is a bacterial cell, while in another it is a eukaryotic cell. Examples of bacterial cells include but are not limited to Escherichia coli. Examples of eukaryotic cell include but are not limited to a mammalian cell, a Zebrafish cell, a Xenopus cell, a C. elegans cell, a yeast cell (e.g., Saccharomyces cerevisiae), an insect cell (e.g., Drosophila cell), a plant cell, and the like.

In other embodiments, derivatives of the reagents (I), (II), (III), (IV), (V), (VI), (VII), and (VIII) such as salts, esters, N-protected, S-protected derivatives are provided. Such derivatives can be routinely produced by one of ordinary skill in the art.

5.2. Kits

The invention also provides kits for carrying out the methods provided herein for functionalization of peptides and/or proteins, for ligation of peptides or proteins to various chemical species and/or for immobilization of functionalized peptides or proteins onto one or more surfaces. Such kits may comprise a carrier, such as a box, carton, tube or the like, adapted to receive one or more containers, such as vials, tubes, ampules, bottles and the like. Containers of the kit comprise selected amounts of one or more compounds, reagents, or buffers or solvents useful in carrying out a method provided herein.

In specific embodiments, a kit comprises one or more reagents of chemical formula (I) through (VIII). In more specific embodiments, a kit can comprise one or more reagents of chemical formula (I) through (VIII), in which the R group is selected from the group consisting of a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, a quantum dot, and any combination thereof.

Kits may further comprise one or more additional components necessary for carrying out one or more particular applications of the methods and reagents of the present invention. For example, the kit may comprise one or more chemical species which are to be ligated to a peptide or protein employing the methods and/or reagents provided herein. In a specific example, the kit can provide one or more reagents of formula (I) through (VIII), in which the R group comprises one or more bioorthogonal functional groups selected from the group consisting of hydrazino (—NHNH₂), hydrazido (—C(O)NHNH₂), oxyamino (—ONH₂), azido (—N₃), alkynyl (—C≡CR′), alkenyl (—CR′═CR′₂), phosphine (—PR₂), 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, and norbornadiene group. The kit can comprise a chemical species or a functionalized solid support which can be reacted with the bioorthogonal group in order to attach the target polypeptide to the chemical species or solid support. In another specific example, the kit can provide one or more reagents of formula (I) through (VIII), in which the R group comprises a fluorescent molecule selected from the group consisting of a coumarin derivative (e.g., Alexa™ dyes), a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives (e.g., CyDyes), a phthalocyanine derivative, and a oxazine derivative (e.g., resorufin). In another specific example, the kit can provide one or more reagents of formula (I) through (VIII), in which the R group comprises a biotin or biotin analogue.

In general, kits may also comprise one or more buffers, reaction containers or tools for carrying out the functionalization of the target polypeptide(s), means for purification of the functionalized polypeptide(s), control samples, one or more sets of instructions, and the like.

In another specific embodiment, the invention provides a kit which comprises reagents, buffers and one or more other components for forming a thioester-comprising polypeptide by intein-mediated splicing. Such kits can also comprise, in certain embodiments, a surface upon which the protein thioester is formed for subsequent reaction with a reagent provided herein. Such kits can further comprise one or more reagents provided herein, one or more buffers for carrying out a method provided herein, one or more surfaces for immobilization of the functionalized polypeptide(s), one or more chemical species for attachment to the functionalized polypeptide(s), one or more means for assaying the functionalized polypeptide(s) and instructions for carrying out one or more of the methods provided herein.

In a specific embodiment, a kit is provided for forming a covalent linkage between a polypeptide and a chemical species, the kit comprising:

-   -   a. at least one chemical reagent of formula (I), (II), (III),         (IV), (V), (VI), (VII), or (VIII), or a salt of the reagent; and     -   b. one or a plurality of containers, wherein at least one         container comprises a pre-selected or desired amount of at least         one of the chemical reagents of formula (I), (II), (III), (IV),         (V), (VI), (VII), or (VIII), or a salt of the reagent, wherein:         -   i. R is the chemical species which is to be covalently             linked to the polypeptide,         -   ii. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R′ is independently an H, an aliphatic,             a substituted aliphatic, an aryl, or a substituted aryl             group,         -   iv. n is 2 or 3, and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H,             an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In another embodiment of the kit, R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.

In another embodiment of the kit, R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, norbornadiene groups, wherein each R′ is independently H, aliphatic, substituted aliphatic, aryl, or substituted aryl group.

In another embodiment of the kit, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and a oxazine derivative.

In another embodiment of the kit, R is biotin, a biotin analogue, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15

In another embodiment of the kit, the at least one reagent comprises at least one compound selected from the group consisting of:

-   -   a. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond:     -   b. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. a compound of formula (I), wherein:     -   R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

In another embodiment of the kit, the kit further comprises a functionalized solid support with which the functional group R reacts. Functionalized solid supports and surfaces with which functional groups R can react are well known in the art.

In another specific embodiment, a kit is provided for immobilizing a polypeptide to a surface, the kit comprising:

-   -   a. a chemical reagent of formula (Ib), (IIb), (IIIb), (IVb),         (Vb), (VIb), (VIIb), or (VIIIb):

and

-   -   b. one or a plurality of containers, wherein at least one         container comprises a surface to which a chemical reagent of         formula (Ib), (IIb), (IIIb), (IVb), (Vb), (VIb), (VIIb), or         (VIIIb) is covalently bound, and wherein:         -   v. R₁ is hydrogen, a substituted or non-substituted             aliphatic group, or a substituted or non-substituted aryl             group,         -   vi. X, Y, W, and Z are hydrogen or non-hydrogen substituents             selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, and wherein each R′ is independently an H, an             aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group,         -   vii. n is 2 or 3, and         -   viii. L is a linker or a linker group selected from the             group consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H,             an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In one embodiment of the kit, the surface is a solid support.

In another embodiment of the kit, the solid support is a resin, a nanoparticle, or the surface of a microarray.

5.3. Compounds and Compositions

Compounds and compositions are also provided. These compounds and compositions can be used as reagents (also referred to herein as “chemical reagents”) according to the methods provided herein.

Examples 1-4 set forth methods that can be used to synthesize the compounds and compositions.

A compound (also referred to herein as a “reagent”, a “chemical reagent” or a “composition”) is provided having the formula (I), (II), (III), (IV), (V), (VI), (VII) or (VIII):

or a salt thereof, wherein:

-   -   i. R is a functional group, a label molecule, a tag molecule, an         affinity label molecule, a photoaffinity label, a dye, a         chromophore, a fluorescent molecule, a phosphorescent molecule,         a chemiluminescent molecule, an energy transfer agent, a         photocrosslinker molecule, a redox-active molecule, an isotopic         label molecule, a spin label molecule, a metal chelator, a         metal-comprising moiety, a heavy atom-comprising-moiety, a         radioactive moiety, a contrast agent molecule, a MRI contrast         agent, an isotopically labeled molecule, a PET agent, a         photocaged moiety, a photoisomerizable moiety, a chemically         cleavable group, a photocleavable group, an electron dense         group, a magnetic group, an amino acid, a polypeptide, an         antibody or antibody fragment, a carbohydrate, a monosaccharide,         a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a         siRNA, a polynucleotide, an antisense polynucleotide, a peptide         nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a         biotin analogue, a biomaterial, a polymer, a water-soluble         polymer, a polyethylene glycol derivative, a water-soluble         dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic         acid-, or receptor-binding molecule, a biologically active         molecule, a drug or drug candidate, a cytotoxic molecule, a         solid support, a surface, a resin, a nanoparticle, a quantum         dot, or any combination thereof,     -   ii. R₁ is hydrogen, a substituted or non-substituted aliphatic         group, or a substituted or non-substituted aryl group,         -   iii. X, Y, W, and Z are hydrogen and/or non-hydrogen             substituents selected from the group consisting of alkyl,             heteroatom-comprising alkyl, alkenyl, heteroatom-comprising             alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl,             heteroatom-comprising aryl, alkoxy, heteroatom-comprising             alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH,             —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′,             —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and             —S—C≡N, wherein each R is independently H, alkyl, or             substituted alkyl,         -   iv. n is 2 or 3; and         -   v. L is a linker or a linker group selected from the group             consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄             substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising             alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl,             C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄             substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl,             C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₂-C₂₄ substituted             heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄             substituted aryl, C₅-C₂₄ substituted heteroatom-comprising             aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄             alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—,             —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1,             2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—,             —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═,             —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and             —C(R′)₂—N(R′)—N(R′)— group, where each R′ is independently             an H, an aliphatic, a substituted aliphatic, an aryl, or a             substituted aryl group.

In one embodiment of the compound, R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′=CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, and norbornadiene groups, and

-   -   each R′ is independently an H, an aliphatic, a substituted         aliphatic, an aryl, or a substituted aryl group.

In another embodiment of the compound, R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.

In another embodiment of the compound, R is biotin, a biotin analogue, or a perfluorinated alkyl chain —CF₃—(CF₂)_(m)— where m=3-15.

In another embodiment of the compound, R is a poly(ethyleneglycol) molecule.

In another embodiment of the compound, R is a resin or a nanoparticle.

In another embodiment of the compound, R is a functionalized surface.

In another embodiment of the compound, R₁, X, Y, and Z are hydrogen atoms,

-   -   L is selected from the group consisting of —C(O)NR′—,         —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂—CH₂—O)n-,     -   R′ is a hydrogen, alkyl or aryl group, and     -   n is an integer number from 1 to 15.

In another embodiment of the compound, R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.

In another embodiment of the compound, the compound has formula (I), wherein:

-   -   a. R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂ or —N₃, and     -   L is a single bond;     -   b. R₁, X, Y, and Z are hydrogen atoms,     -   R is —ONH₂, and     -   L is a linker or linker group of formula

-   -   c. R₁, X, Y, and Z are hydrogen atoms,     -   R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and     -   L is —C(O)NHCH₂C(O)—; or     -   d. R₁, X, Y, and Z are hydrogen atoms,     -   R is biotin, and     -   L is —C(O)NH(CH₂)₃NH—.

The compositions and reagents encompassed by the invention may comprise one or more chiral centers. Accordingly, the compounds are intended to include racemic mixtures, diastereomers, enantiomers, and mixture enriched in one or more stereoisomer. When a group of substituents is disclosed herein, all the individual members of that group and all subgroups, including any isomers, enantiomers, and diastereomers are intended to be included in the disclosure. Additionally, all isotopic forms of the compounds provided herein are intended to be included in the disclosure. For example, it is understood that any one or more hydrogens in a molecule disclosed herein can be replaced with deuterium or tritium.

A skilled artisan will appreciate that starting materials, biological materials, reagents, synthetic methods, purification methods, analytical methods, assay methods, and biological methods other than those specifically exemplified can be employed in the practice of the invention. All art-known functional equivalents of any such materials and methods are intended to be included in the invention.

Unless otherwise indicated, the disclosure is not limited to specific molecular structures, substituents, synthetic methods, reaction conditions, or the like, as such may vary. It is to be understood that the embodiments are not limited to particular compositions or biological systems, which can, of course, vary.

5.4. Uses for the Methods, Kits and Compositions

Efficient methods for C-terminal functionalization of a protein can be used for protein labeling or immobilization under non-disruptive conditions.

The methods provided herein for protein C-terminal labeling and/or immobilization are characterized by faster reaction kinetics than current methods known in the art, and have high labeling efficiencies, in particular at short reaction times. According to the methods provided herein, much lower concentrations of reagents (either the target C-terminal thioester protein, or the labeling reagent, or both) are needed to achieve satisfactory yields of the desired protein-functionalized product. Furthermore, thiol catalysts such as, for example, thiophenol, mercaptoethanol, or MESNA, are not required to expedite and/or increase the yields of the protein-functionalization methods provided herein. The methods provided herein can be these used at the intracellular level for in vivo protein labeling applications. Furthermore, the rapid protein labeling methods provided herein enable the detection and isolation of transient or short-lived protein species in the context of proteomic or cell biology studies. Finally, certain proteins with limited stability, which may not be compatible with the need for high reagent or catalyst concentrations associated with other methods known in the art, can be functionalized and/or immobilized using the methods provided herein.

5.5. Terms and Expressions

The terms and expressions that are employed herein are used as terms of description and not of limitation, and there is no intention that in the use of such terms and expressions of excluding any equivalents of the features shown and described and portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the concepts herein disclosed may be resorted to those skilled in the art, and that such modifications and variations are considered to be within the scope of the invention as defined by the appended claims.

Unless otherwise stated herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.

The term “functional group” as used herein refers to a contiguous group of atoms that, together, may undergo a chemical reaction under certain reaction conditions. Examples of functional groups are, among many others, —OH, —NH₂, —SH, —(C═O)—, —N₃, —C≡CH.

The term “aliphatic” is used in the conventional sense to refer to an open-chain or cyclic, linear or branched, saturated or unsaturated hydrocarbon group, including but not limited to alkyl group, alkenyl group and alkynyl groups. The term “heteroatom-comprising aliphatic” as used herein refer to an aliphatic moiety where at least one carbon atom is replaced with a non-carbon atom, e.g., oxygen, nitrogen, sulphur, selenium, phosphorus, or silicon, and typically oxygen, nitrogen, or sulphur.

The terms “alkyl” and “alkyl group” as used herein refer to a linear, branched, or cyclic saturated hydrocarbon typically comprising 1 to 24 carbon atoms, preferably 1 to 12 carbon atoms, such as methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, t-butyl, octyl, decyl and the like. The term “heteroatom-comprising alkyl” as used herein refers to an alkyl moiety where at least one carbon atom is replaced with a heteroatom, e.g., oxygen, nitrogen, sulphur, phosphorus, or silicon, and typically oxygen, nitrogen, or sulphur.

The terms “alkenyl” and “alkenyl group” as used herein refer to a linear, branched, or cyclic hydrocarbon group of 2 to 24 carbon atoms, preferably of 2 to 12 carbon atoms, comprising at least one double bond, such as ethenyl, n-propenyl, isopropenyl, n-butenyl, isobutenyl, octenyl, decenyl, and the like. The term “heteroatom-comprising alkenyl” as used herein refer to an alkenyl moiety where at least one carbon atom is replaced with a heteroatom.

The terms “alkynyl” and “alkynyl group” as used herein refer to a linear, branched, or cyclic hydrocarbon group of 2 to 24 carbon atoms, preferably of 2 to 12 carbon atoms, comprising at least one triple bond, such as ethynyl, n-propynyl, and the like. The term “heteroatom-comprising alkynyl” as used herein refer to an alkynyl moiety where at least one carbon atom is replaced with a heteroatom.

The terms “aryl” and “aryl group” as used herein refer to an aromatic substituent comprising a single aromatic or multiple aromatic rings that are fused together, directly linked, or indirectly linked (such as linked through a methylene or an ethylene moiety). Preferred aryl groups comprise 5 to 24 carbon atoms, and particularly preferred aryl groups comprise 5 to 14 carbon atoms. The term “heteroatom-comprising aryl” as used herein refer to an aryl moiety where at least one carbon atom is replaced with a heteroatom.

The terms “alkoxy” and “alkoxy group” as used herein refer to an aliphatic group or a heteroatom-comprising aliphatic group bound through a single, terminal ether linkage. Preferred aryl alkoxy groups comprise 1 to 24 carbon atoms, and particularly preferred alkoxy groups comprise 1 to 14 carbon atoms. The terms “aryloxy” and “aryloxy group” as used herein refer to an aryl group or a heteroatom-comprising aryl group bound through a single, terminal ether linkage. Preferred aryloxy groups comprise 5 to 24 carbon atoms, and particularly preferred aryloxy groups comprise 5 to 14 carbon atoms.

The terms “halo” and “halogen” are used in the conventional sense to refer to a fluoro, chloro, bromo or iodo substituent. By “substituted” it is intended that in the alkyl, alkenyl, alkynyl, aryl, or other moiety, at least one hydrogen atom is replaced with one or more “substituents”.

The term “substituents” refers to a contiguous group of atoms. Examples of “substituents” include, but are not limited to: alkoxy, aryloxy, alkyl, heteroatom-comprising alkyl, alkenyl, heteroatom-comprising alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl, heteroatom-comprising aryl, alkoxy, heteroatom-comprising alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, hydroxyl (—OH), sulfhydryl (—SH), substituted sulfhydryl, carbonyl (—CO—), thiocarbonyl, (—CS—), carboxy (—COOH), amino (—NH₂), substituted amino, nitro (—NO₂), nitroso (—NO), sulfo (—SO₂—OH), cyano (—C≡N), cyanato (—O—C≡N), thiocyanato (—S—C≡N), formyl (—CO—H), thioformyl (—CS—H), phosphono (—P(O)OH₂), substituted phosphono, and phospho (—PO₂).

The term “contact” as used herein with reference to interactions of chemical units indicates that the chemical units are at a distance that allows short range non-covalent interactions (such as Van der Waals forces, hydrogen bonding, hydrophobic interactions, electrostatic interactions, dipole-dipole interactions) to dominate the interaction of the chemical units. For example, when a protein is ‘contacted’ with a chemical species, the protein is allowed to interact with the chemical species so that a reaction between the protein and the chemical species can occur.

The term “bioorthogonal” as used herein with reference to a reaction, reagent, or functional group, indicates that such reaction, reagent, or functional group does not exhibit significant or detectable reactivity towards biological molecules such as those present in a bacterial, yeast or mammalian cell. The biological molecules can be, e.g., proteins, nucleic acids, fatty acids, or cellular metabolites.

In general, the term “mutant” or “variant” as used herein with reference to a molecule such as polynucleotide or polypeptide, indicates that such molecule has been mutated from the molecule as it exists in nature. In particular, the term “mutate” and “mutation” as used herein indicates any modification of a nucleic acid and/or polypeptide that results in an altered nucleic acid or polypeptide. Mutations include any process or mechanism resulting in a mutant protein, enzyme, polynucleotide, or gene. A mutation can occur in a polynucleotide or gene sequence, by point mutations, deletions, or insertions of single or multiple nucleotide residues. A mutation in a polynucleotide includes mutations arising within a protein-encoding region of a gene as well as mutations in regions outside of a protein-encoding sequence, such as, but not limited to, regulatory or promoter sequences. A mutation in a coding polynucleotide such as a gene can be “silent”, i.e., not reflected in an amino acid alteration upon expression, leading to a “sequence-conservative” variant of the gene. A mutation in a polypeptide includes but is not limited to mutation in the polypeptide sequence and mutation resulting in a modified amino acid. Non-limiting examples of a modified amino acid include a glycosylated amino acid, a sulfated amino acid, a prenylated (e.g., farnesylated, geranylgeranylated) amino acid, an acetylated amino acid, an acylated amino acid, a PEGylated amino acid, a biotinylated amino acid, a carboxylated amino acid, a phosphorylated amino acid, and the like.

The term “engineer” refers to any manipulation of a molecule that result in a detectable change in the molecule, wherein the manipulation includes but is not limited to inserting a polynucleotide and/or polypeptide heterologous to the cell and mutating a polynucleotide and/or polypeptide native to the cell.

The term “nucleic acid molecule” as used herein refers to deoxyribonucleotides, deoxyribonucleosides, ribonucleosides or ribonucleotides and polymers thereof in either single- or double-stranded form. By way of example only, such nucleic acids and nucleic acid polymers include, but are not limited to, analogues of natural nucleotides that have similar properties as a reference nucleic acid and oligonucleotide analogues including, but are not limited to, PNA (peptidonucleic acid), analogues of DNA used in antisense technology (phosphorothioates, phosphoroamidates, and the like).

The terms “polypeptide,” “peptide” and “protein” as used herein refer to any chain of two or more amino acids bonded in sequence, regardless of length or post-translational modification. That is, a description directed to a polypeptide applies equally to a description of a peptide and a description of a protein, and vice versa Amino acid residues include residues resulting from natural and unnatural amino acids. The terms “polypeptide,” “peptide” and “protein” apply to naturally-occurring amino acid polymers as well as to amino acid polymers in which one or more amino acid residues is an unnatural amino acid. Additionally, such “polypeptides,” “peptides” and “proteins” include amino acid chains of any length, including full length proteins, wherein the amino acid residues are linked by covalent peptide bonds or other linkages. The terms “target polypeptide”, “thioester-comprising polypeptide”, or “target thioester-comprising polypeptide” as used herein refer to a polypeptide that is to be targeted for functionalization according to the protein functionalization methods provided herein. The target polypeptide can be a polypeptide produced synthetically or recombinantly or via a combination of synthetic and recombinant methods.

The term “precursor polypeptide” or “intein-fused target polypeptide” as used herein refers to a polypeptide construct in which the target polypeptide is C-terminally fused to an intein protein or an engineered variant thereof. According to their common use in the art, the term “peptide” refers to any polypeptide consisting of 2 and up to 40-50 amino acid residues, whereas the term “protein” refers to any polypeptide consisting of more than 50 amino acid residues. These definitions are however not intended to be limiting.

The term “intein” and “intein domain” as used herein refers to a naturally occurring or artificially constructed polypeptide sequence embedded within a precursor protein that can catalyze a splicing reaction during post-translational processing of the protein. The NEB Intein Registry (http://www.neb.com/neb/inteins.html) provides a list of known inteins. The term “split intein” as used herein refers to an intein that has two or more separate components not fused to one another.

The term “splicing” as used herein refers to the process involving the cleavage of the main backbone of an intein-comprising polypeptide by virtue of a reaction or process catalyzed by an intein or portions of an intein. “N-terminal splicing” refers to the cleavage of a polypeptide chain fused to the N-terminus of an intein, such reaction typically involving the scission of the thioester (or ester) bond formed via intein-catalyzed N→S (or N→O acyl) transfer, by action of a nucleophilic functional group or a chemical species comprising a nucleophilic functional group. “C-terminal splicing” refers to the cleavage of a polypeptide chain fused to the C-terminus of an intein. “Self-splicing” as used herein refers to the process involving the cleavage of an intein from a polypeptide, within which the intein is embedded.

The term “ligation” as used herein refers to a process or reaction that lead to formation of a bond connecting two molecules. The term ‘intein-mediated ligation’ as used herein refers to a chemical bond-forming reaction that involves a nucleophilic substitution at a thioester or ester linkage formed via intein-catalyzed N→S or N→O acyl transfer, by action of a nucleophilic functional group or a chemical species comprising a nucleophilic functional group.

The terms “vector” and “vector construct” as used herein refer to a vehicle by which a DNA or RNA sequence (e.g., a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g., transcription and translation) of the introduced sequence. A common type of vector is a “plasmid”, which generally is a self-contained molecule of double-stranded DNA that can be readily accept additional (foreign) DNA and which can readily introduced into a suitable host cell. A large number of vectors, including plasmid and fungal vectors, have been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts. Non-limiting examples include pKK plasmids (Clonetech), pUC plasmids, pET plasmids (Novagen, Inc., Madison, Wis.), pRSET or pREP plasmids (Invitrogen, San Diego, Calif.), or pMAL plasmids (New England Biolabs, Beverly, Mass.), and many appropriate host cells, using methods disclosed or cited herein or otherwise known to those skilled in the relevant art. The terms “express” and “expression” refer to allowing or causing the information in a gene or DNA sequence to become manifest, for example producing a protein by activating the cellular functions involved in transcription and translation of a corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to form an “expression product” such as a protein. The expression product itself, e.g., the resulting protein, may also be said to be “expressed” by the cell. A polynucleotide or polypeptide is expressed recombinantly, for example, when it is expressed or produced in a foreign host cell under the control of a foreign or native promoter, or in a native host cell under the control of a foreign promoter.

The term “fused” as used herein means being connected through one or more covalent bonds. The term “bound” as used herein means being connected through non-covalent interactions. Examples of non-covalent interactions are van der Waals, hydrogen bond, electrostatic, and hydrophobic interactions. Thus, a “polypeptide tethered to a solid support” refers to a polypeptide that is connected to a solid support (e.g., surface, resin bead) either via non-covalent interactions or through covalent bonds.

The terms “label molecule” or “tag molecule” as used herein refer to a molecule that allows detection of or monitoring of the structural changes in another molecule covalently bound to it (e.g., a target polypeptide) by physical detection methods. Examples of physical detection methods include, but are not limited to, mass spectrometry, UV absorbance, fluorescence, luminescence, circular dichroism, nuclear magnetic resonance, and the like. The terms “affinity label molecule” or “affinity tag” as used herein refer to a molecule that allows for the isolation of another molecule covalently bound to it (e.g., a target polypeptide) by physical methods. Examples of physical methods include, but are not limited to, affinity chromatography, reverse-phase chromatography, ion-exchange chromatography, gel-permeation chromatography, and related techniques. The term “photoaffinity label,” as used herein, refers to a label molecule with a functional group, which, upon exposure to light, forms a linkage with a molecule for which the label molecule has an affinity. By way of example only, such a linkage may be covalent or non-covalent.

The term “dye,” as used herein, refers to a soluble, coloring substance that comprises a chromophore. The term “chromophore,” as used herein, refers to a molecule that absorbs light of visible wavelengths, UV wavelengths or IR wavelengths. The term “fluorescent molecule” as used herein refers to a molecule which upon excitation emits photons and is thereby fluorescent. The term “chemiluminescent molecule” as used herein refers to a molecule that emits light as a result of a chemical reaction without the addition of heat. By way of example only, luminol (5-amino-2,3-dihydro-1,4-phthalazinedione) reacts with oxidants like hydrogen peroxide (H₂O₂) in the presence of a base and a metal catalyst to produce an excited state product (3-aminophthalate, 3-APA) subsequently resulting in the release of detectable light. The term “energy transfer agent,” as used herein, refers to a molecule that can either donate or accept energy from another molecule. By way of example only, fluorescence resonance energy transfer (FRET) is a dipole-dipole coupling process by which the excited-state energy of a fluorescence donor molecule is non-radiatively transferred to an unexcited acceptor molecule which then fluorescently emits the donated energy at a longer wavelength.

The term “photocrosslinker,” as used herein, refers to a compound comprising two or more functional groups which, upon exposure to light, are reactive and form a covalent or non-covalent linkage with two or more monomeric or polymeric molecules.

The term “redox-active agent,” as used herein, refers to a molecule that oxidizes or reduces another molecule, whereby the redox active agent becomes reduced or oxidized. Examples of redox active agent include, but are not limited to, ferrocene, quinones, Ru^(2+/3+) complexes, Co^(2+/3+) complexes, and Os^(2+/3+) complexes.

The term “spin label,” as used herein, refers to molecules that comprise an atom or a group of atoms exhibiting an unpaired electron spin (i.e., a stable paramagnetic group) that can be detected by electron spin resonance spectroscopy and can be attached to another molecule. Such spin-label molecules include, but are not limited to, nitryl radicals and nitroxides, and may be single spin-labels or double spin-labels.

The term “heavy atom,” as used herein, refers to an atom that is usually heavier than carbon. Such ions or atoms include, but are not limited to, silicon, tungsten, gold, lead, and uranium.

The term “radioactive moiety,” as used herein, refers to a group whose nuclei spontaneously release nuclear radiation, such as alpha, or beta particles, or gamma radiation.

The term “contrast agent” as used herein refer to a molecule that can be visualized, typically in the context of a biological tissue or organism, by means of physical detection methods. The term “MRI contrast agent” as used herein refer to a molecule that can be visualized, typically in biological tissue or organism, by means of magnetic resonance imaging (MRI). An example of an MRI contrast agents are gadolinium-based complexes and the like. The term “PET agent” as used herein refer to a molecule that can be visualized, typically in biological tissue or organism, by means of positron emission tomography (PET).

The term “photocaged moiety,” as used herein, refers to a group that, upon illumination at certain wavelengths, covalently or non-covalently binds ions or other molecules. The term “photoisomerizable moiety,” as used herein, refers to a group wherein upon illumination with light changes from one isomeric form to another.

The term “chemically cleavable group” as used herein refers to a functional group that breaks or cleaves upon exposure to acid, base, oxidizing agents, reducing agents, chemical inititiators, or radical initiators. The term “photocleavable group” as used herein refers to a functional group that breaks or cleaves upon exposure to light.

The term “electron dense group,” as used herein, refers to a group that scatters electrons when irradiated with an electron beam. Such groups include, but are not limited to, ammonium molybdate, bismuth subnitrate cadmium iodide, carbohydrazide, ferric chloride hexahydrate, hexamethylene tetramine, and potassium ferricyanide.

The term “antibody fragment” as used herein refers to any form of an antibody other than the full-length form. Antibody fragments include but are not limited to Fv, Fc, Fab, and (Fab′)2, single chain Fv (scFv), diabodies, combinations of CDRs, heavy chains, or light chains, bispecific antibodies, and the like.

The term “biotin analogue,” or also referred to as “biotin mimic,” as used herein, is any molecule, other than biotin, that binds with high affinity to avidin and/or streptavidin.

The term “isotopically labeled molecule” as used herein refers to a molecule that contains an enriched amount of a specific isotope of (a) certain atom(s) within the molecule as compared to the normal isotopic distribution. Example of “isotopically labeled molecules” include, but are not limited to, molecules comprising enriched amounts of ²H, ³H, ¹³C, ¹⁴N, ¹⁸F, and the like.

The term “polymer,” as used herein, refers to a molecule composed of repeated subunits. Such molecules include, but are not limited to, proteins, polypeptides, peptides, polynucleotides, polysaccharides, polyalkylene glycols, polyethylene, and polystyrene. As used herein, the term “water soluble polymer” refers to any polymer that is soluble in aqueous solvents. Such water soluble polymers include, but are not limited to, polyethylene glycol, polyethylene glycol propionaldehyde, mono C₁-C₁₀ alkoxy or aryloxy derivatives thereof monomethoxy-polyethylene glycol, polyvinyl pyrrolidone, polyvinyl alcohol, polyamino acids, divinylether maleic anhydride, N-(2-Hydroxypropyl)-methacrylamide, dextran, dextran derivatives including dextran sulfate, polypropylene glycol, polypropylene oxide/ethylene oxide copolymer, polyoxyethylated polyol, heparin, heparin fragments, polysaccharides, oligosaccharides, glycans, cellulose and cellulose derivatives, including but not limited to methylcellulose and carboxymethyl cellulose, serum albumin, starch and starch derivatives, polypeptides, polyalkylene glycol and derivatives thereof, copolymers of polyalkylene glycols and derivatives thereof, polyvinyl ethyl ethers, and alpha-beta-poly[(2-hydroxyethyl)-DL-aspartamide, and the like, or mixtures thereof. By way of example only, coupling of such water soluble polymers to the target polypeptide according to the methods provided herein, result in changes including, but not limited to, increased water solubility, increased or modulated serum half-life, increased or modulated therapeutic half-life relative to the unmodified form, increased bioavailability, modulated biological activity, extended circulation time, modulated immunogenicity, modulated physical association characteristics including, but not limited to, aggregation and multimer formation, altered receptor binding, altered binding to one or more binding partners, and altered receptor dimerization or multimerization.

The term “biologically active molecule” as used herein refers to any molecule that can affect any physical or biochemical properties of a biological system, pathway, molecule, or interaction relating to an organism, including but not limited to, viruses, bacteria, bacteriophage, transposon, prion, insects, fungi, plants, animals, and humans. Examples of biologically active molecules include, but are not limited to, peptides, proteins, DNA, RNA, small-molecule drugs, polysaccharides, carbohydrates, lipids, radionuclides, toxins, cells, viruses, liposomes, microparticles and micelles.

The term “drug” as used herein refers to any substance used in the prevention, diagnosis, alleviation, treatment, or cure of a disease or condition.

The term “cytotoxic” as used herein, refers to a compound that harms cells.

The term “solid support” is used in the commonly accepted meaning to indicate any solid inorganic or organic, polymeric or non-polymeric material onto which a given molecule can be covalently or non-covalently bound so that the molecule is immobilized onto the solid support. Non-limiting examples of “solid supports” include, but are not limited to, solid and semisolid matrixes, such as aerogels and hydrogels, resins, beads, biochips (including thin film coated biochips), microfluidic chip, a silicon chip, multi-well plates (also referred to as microtitre plates or microplates), membranes, cells, conducting and nonconducting metals, glass (including microscope slides) and magnetic supports. Other non-limiting examples of “solid supports” used in the methods and compositions described herein include silica gels, polymeric membranes, particles, derivatized plastic films, derivatized glass, controlled pore glass, derivatized silica, glass beads, cotton, plastic beads, alumina gels, polysaccharides such as Sepharose, poly(acrylate), polystyrene, poly(acrylamide), polyol, agarose, agar, cellulose, dextran, starch, FICOLL, heparin, glycogen, amylopectin, mannan, inulin, nitrocellulose, diazocellulose, polyvinylchloride, polypropylene, polyethylene (including poly(ethylene glycol)), nylon, latex bead, magnetic bead, paramagnetic bead, superparamagnetic bead, starch and the like. The configuration of the solid support can be in the form of beads, spheres, particles, gel, a membrane, or a surface. In certain embodiments, the solid supports used in the methods and compositions described herein are solid supports used for surface analysis such as surface acoustic wave devices or devices utilizing evanescent wave analysis, such as surface plasmon resonance analysis.

The term “resin” as used herein refers to high molecular weight, insoluble polymer beads. By way of example only, such beads may be used as supports for solid phase peptide synthesis, or sites for attachment of molecules prior to purification.

The term “nanoparticle” as used herein refers to a particle that has a particle size between about 500 nm (i.e., 500 nm±10%) to about 1 nm (i.e., 1 nm±10%).

The term “about” as used herein to modify a number, quantity, amount or numerical measurement, refers to a variation in that number, quantity, amount or numerical measurement from ±0% to ±10%.

The following examples are offered by way of illustration and not by way of limitation.

6. EXAMPLES Example 1 Synthesis of 1-amino-2-(mercaptomethyl)-aryl compounds

This example demonstrates the synthesis of a protected amino-thiol-aryl precursor for the generation of 1-amino-2-(mercaptomethyl)-aryl reagents for protein/peptide functionalization using the methods provided herein. In particular, this example illustrates how a N- and S-protected, carboxylic group-functionalized 1-amino-2-(mercaptomethyl)-aryl moiety can be prepared, which can be used as synthetic intermediate for the preparation of reagents of general formula (I) as further described in Examples 6 and 7. Additionally, this protected intermediate can converted to 3-amino-4-(mercaptomethyl)benzoic acid, which can be used directly for protein functionalization as described in Example 10.

As described in the scheme of FIG. 2, the target compound 3-amino-4-(mercaptomethyl)benzoic acid (11) was prepared starting from methyl 3-amino-4-methylbenzoate 1 in five steps. Boc protection of the amino group in 1, followed by benzylic bromination, followed by substitution of the benzyl bromide with triphenylmethylmercaptan yielded the N-Boc,S-trityl protected intermediate 2. Hydrolysis of the methyl ester group in 2 under basic conditions then yielded the corresponding N-Boc,S-trityl protected benzoic acid derivative which contains a convenient carboxy group functionality that can be used for coupling various chemical entities (fluorescent dyes, affinity tags, etc.) to the 1-amino-2-(mercaptomethyl)-aryl moiety as described in Examples 6 and 7. This intermediate 7 was de-protected under acidic conditions to yield the carboxylic acid functionalized reagent 11, which can be used directly for protein functionalization.

Experimental Details for Example 1

Methyl 3-amino-4-methylbenzoate 1 (9.7 g, 58.7 mmol) and di-tert-butyl dicarbonate (17 mL, 74 mmol, 1.2 eq) were dissolved in 200 mL dry THF. The reaction mixture was heated to reflux for 72 h. Solvent was removed by rotovap to afford a pink-white solid. The crude material was suspended in 30 mL ice-cold hexanes and filtered to afford methyl 3-((tert-butoxycarbonyl)amino)-4-methylbenzoate as a white solid, (99% yield). ¹H NMR (500 MHz, CDCl₃) δ=8.45 (s, 1H), 7.69 (d, J=7.9 Hz, 1H), 7.21 (d, J=7.9 Hz, 1H), 6.29 (s, 1H), 3.90 (s, 3H), 2.30 (s, 3H), 1.55 ppm (d, J=11.2 Hz, 9H). ¹³C NMR (126 MHz, CDCl₃) δ=166.97, 152.81, 136.43, 132.60, 130.37, 128.96, 124.90, 121.83, 80.84, 52.04, 28.31, 17.99 ppm. This material (6.63 g, 25 mmol) was dilute in 100 mL carbon tetrachloride and the flask was heated to 70° C. to aid solubility. N-Bromosuccinamide (4.89 g, 27.5 mmol, 1.1 eq) was added. The reaction vessel was equipped with a reflux condenser and irradiated with UV light for 3 hours. The reaction was cooled to room temperature then filtered. The filtrate was dilute in 100 mL DCM, washed with Saturated K₂CO₃ (aq), Brine, then dried over anhydrous MgSO₄. Volatiles were removed to afford methyl 4-(bromomethyl)-3-((tert butoxycarbonyl)amino)benzoate 6.7 g (78%) as a orange-white solid. ¹H NMR (500 MHz, CDCl₃) δ=8.47 (s, 1H), 7.73 (dd, J=8.0, 1.7 Hz, 1H), 7.36 (d, J=8.0 Hz, 1H), 6.75 (s, 1H), 4.50 (s, 2H), 3.91 (s, 3H), 1.55 ppm (s, 9H). ¹³C NMR (126 MHz, CDCl₃) δ=28.2, 29.9, 52.3, 81.3, 123.8, 125.1, 130.0, 131.5, 131.7, 136.9, 152.6, 166.2 ppm. Methyl 4-(bromomethyl)-3-((tert-butoxycarbonyl)amino)benzoate (6.7 g, 19.59 mmol), Triphenyl-methyl mecaptan (6.49 g, 23.5 mmol, 1.2 eq) and Potassium Carbonate (3.25 g, 23.5 mmol, 1.2 eq) were dissolved in 100 mL dry DMF. The reaction stirred under argon at room temperature for 15 hours, concentrated to 10 mL under reduced pressure, then resuspended in DCM. The solution was washed once with ice-cold H₂O, once with Saturated NaHCO₃, and finally once with brine. The organic layer was dried over anhydrous MgSO₄, filtered, and volatiles were removed to afford a golden-yellow solid methyl 3-((tert-butoxycarbonyl)amino)-4-((tritylthio)methyl)benzoate 2 (10.24 g, 97% crude yield). Material was carried forward without further purification. ¹H NMR (500 MHz, CDCl₃) δ=8.41 (s, 1H), 7.65 (d, J=7.9 Hz, 1H), 7.48 (d, J=8.0 Hz, 5H), 7.34 (t, J=7.8 Hz, 6H), 7.25 (t, J=7.3 Hz, 5H), 7.18 (d, J=8.0 Hz, 1H), 6.72 (s, 1H), 3.88 (s, 3H), 3.21 (s, 2H), 1.56 ppm (d, J=2.5 Hz, 9H). ¹³C NMR (126 MHz, CDCl₃) δ=166.69, 152.84, 144.09, 136.93, 130.75, 129.34, 128.23, 126.98, 124.89, 123.09, 80.77, 67.42, 52.14, 34.08, 28.38 ppm.

Methyl 3-((tert-butoxycarbonyl)amino)-4-((tritylthio)methyl)benzoate 2 (1.6 g, 2.96 mmol) was dissolved in 37 mL THF. 1.0 M Lithium Hydroxide (aq) (7.54 mL) was added and the reaction mixture stirred under argon at ambient temperature for 48 hours. Following completion, volatiles were removed under reduced pressure and the resulting material was dissolved in ethyl acetate and washed once with 0.25M HCl (aq) and once with brine. The organic layer was dried over anhydrous MgSO₄ filtered and concentrated in vacuuo to yield carboxylic acid AMA derivative 7 as an off-white solid (1.6 g, quant. yield). ¹HNMR (400 MHz, D4-MeOH) δ 7.99 (s, 1H), 7.67 (dd, J=7.97, 1.62 Hz, 1H), 7.43 (q, J=3.13 Hz, 6H), 7.31 (t, J=7.46 Hz, 6H), 7.23(t, J=7.31 Hz, 3H), 7.09 (d, J=8.07 Hz, 2H), 3.33(s, 2H), 1.49 ppm (s, 9H).

3-((tert-butoxycarbonyl)amino)-4-((tritylthio)methyl)benzoic acid 7 (175.6 mg, 0.334 mmol) was dissolved in 2 mL anhydrous dichloromethane under argon. Triisopropylsilane (135 uL, 0.668 mmol) was added and the solution was cooled to 0° C. Trifluoroacetic acid (1 mL) was added and the reaction mixture was stirred for 20 minutes before being warmed to room temperature and stirred for another 20 minutes. Volatiles were removed under reduced pressure and the resulting solid was suspended in cold hexanes and filtered. The resulting white solid was collected as trifluoroaceticacetate salt of 3-amino-4-(mercaptomethyl)benzoic acid 11 (Quantitative yield) LCMS [M+H]⁺ for disulfide C₁₆H₁₆N₂O₄S₂ calculated 365.43 found 365.68

Example 2 Synthesis of additional 1-amino-2-(mercaptomethyl)-aryl compounds

This example demonstrates the synthesis of compounds of general formula (II) which can be used for the purpose of protein/peptide functionalization using the methods provided herein. As described by the scheme in FIG. 3, the desired reagent 3-(mercaptomethyl)-4-amino-benzoic acid (17) from prepared starting from methyl 4-amino-3-methylbenzoate 12 in five steps. Introduction of a tertiary butyl carbamate protecting group to the aryl amino group followed by benzylic bromination and introduction of a thiol functionality through substitution of the benzylic position using the reagent triphenylmethylmercaptan yielded a N-Boc, S-trityl protected intermediate 15. Hydrolysis of the methyl ester to the free carboxylic acid using aqueous lithium hydroxide could provide a convenient chemical handle, which can be used for coupling various chemical entities (fluorescent dyes, affinity tags, etc.) to the amino-thiol moiety. The carboxylic acid intermediate 16 was de-protected using trifluoroacetic acid in the presence of triisopropylsilane to yield reagent 17 which was used directly in protein ligation studies in Example 10. It is understood that other regioisomers of the reagents of formula (I) and (II) such as reagents of general formula (III) and (IV), can be prepared in a similar manner.

Experimental Details for Example 2

Methyl 4-amino-3-methylbenzoate 12 (1.0 g , 6.06 mmol) and Di-tert-butyl dicarbonate (1.59 g, 7.27 mmol, 1.2 eq) were dissolved in 20 mL dry THF. The reaction mixture was heated to reflux for 96 hours. Solvent was removed by rotovap to afford a pink-white solid. The crude material was suspended in 30 mL ice-cold hexanes and filtered to afford methyl 4-((tert-butoxycarbonyl)amino)-3-methylbenzoate 13 as a white solid, (1.57 g, 98% yield). This material (1.57 g, 5.93 mmol) was dilute in 20 mL carbon tetrachloride and the flask was heated to 70° C. to aid solubility. N-Bromosuccinamide (1.16 g, 6.53 mmol, 1.1 eq) was added. The reaction vessel was equipped with a reflux condenser and irradiated with UV light for 3 hours. The reaction was cooled to room temperature then filtered. The filtrate was dilute in 100 mL DCM, washed with Saturated K₂CO₃ (aq), Brine, then dried over anhydrous MgSO₄. Volatiles were removed to afford methyl 3-(bromomethyl)-4-((tert butoxycarbonyl)amino)benzoate 14 (1.78 g, 87%) orange-white solid.

Methyl 3-(bromomethyl)-4-((tert-butoxycarbonyl)amino)benzoate 14 (1.78 g, 5.17 mmol), Triphenyl-methyl mecaptan (1.71 g, 6.2 mmol, 1.2 eq) and Potassium Carbonate (0.857 g, 6.20 mmol, 1.2 eq) were dissolved in 100 mL dry DMF. The reaction stirred under argon at room temperature for 15 hours, concentrated to 10 mL by rotovap, then resuspended in DCM. The solution was washed once with ice-cold H₂O, once with Saturated NaHCO₃, and finally once with brine. The organic layer was dried over anhydrous MgSO₄, filtered, and volatiles were removed to afford a golden-yellow solid methyl 4-((tert-butoxycarbonyl)amino)-3-((tritylthio)methyl)benzoate 15 (80% crude yield). Material was carried forward without further purification.

Methyl 4-((tert-butoxycarbonyl)amino)-3-((tritylthio)methyl)benzoate 15 (0.7 g, 1.29 mmol) was dissolved in 8 mL THF. 1.0 M lithium hydroxide (aq) (3.25 mL) was added and the reaction mixture stirred under argon at ambient temperature for 48 hours. Following completion, volatiles were removed under reduced pressure and the resulting material was dissolved in ethyl acetate and washed once with 0.25M HCl (aq) and once with brine. The organic layer was dried over anhydrous MgSO₄ filtered and concentrated in vacuuo to yield carboxylic acid AMA derivative 16 as an off-white solid (0.678 g, quant. yield).

4-((tert-butoxycarbonyl)amino)-3-((tritylthio)methyl)benzoic acid 16 (0.678 g, 1.29 mmol) was dissolved in 6 mL anhydrous dichloromethane under argon. Triisopropylsilane (808 uL, 4 mmol) was added and the solution was cooled to 0° C. 3 mL Trifluoroacetic acid was added and the reaction mixture was stirred for 20 minutes before being warmed to room temperature and stirred for another 20 minutes. Volatiles were removed under reduced pressure and the resulting solid was suspended in cold hexanes and filtered. The resulting white solid was collected as trifluoroaceticacetate salt of 4-amino-3-(mercaptomethyl)benzoic acid 17 (quant. yield) LCMS [M+H]⁺ for disulfide C₁₆H₁₆N₂O₄S₂ calculated 365.43 found 365.56

Example 3 Synthesis of Oxyamine-Comprising Protein Labeling Reagents

This example demonstrates the synthesis of a protein labeling reagent of general formula (I) comprising a bioorthogonal oxyamine functional group (—ONH₂) as the R group. According to the methods described herein, this reagent can be used for linking a target polypeptide to a bioorthogonal oxyamino functionality, which can be used for further coupling a chemical species to the polypeptide via oxime ligation.

As described in the scheme in FIG. 2, methyl ester 2 was reduced to a benzylic alcohol using Lithium Aluminum Hydride. This benzylic alcohol was activated with methanesulfonyl chloride to prepare the mesylate derivative 4 which was then reacted with N-Boc-hydroxylamine to produce the protected intermediate. This compound was subsequently deprotected with trifluoroacetic acid in the presence of triisopropylsilane to yield the oxyamino-containing reagent 8.

Experimental Details for Example 3

2 (20.32 g, 48 mmol) was dissolved in 400 mL anhydrous THF then cooled to 0° C. 1M lithium aluminum hydride in THF solution (52.8 mL, 52.8 mmol, 1.1 eq) was slowly added. The reaction stirred under argon at 0° C. for 3 hours. The reaction was quenched by the slow addition of 3 mL cold H₂O and 1 mL 4 N NaOH(aq) at 0° C. then stirred for 10 min at room temperature. The resulting mixture was concentrated under reduced pressure to 20 mL and taken up in a mixture of 300 mL EtOAc and 30 mL Saturated NaHCO₃, agitated to suspend insoluble solids then filtered through a Celite pad. The filtrate was washed once with Saturated NaHCO₃ then with brine. The organic layer was dried with anhydrous MgSO₄ and volatiles were removed to afford a yellow solid which was purified via flash column chromatography (silica gel, Hex:EtOAc) to afford a yellow oil (18 g, 95% yield). ¹H NMR (500 MHz, CDCl₃) δ 7.78 (s, 1H), 7.49 (d, J=7.3 Hz, 5H), 7.34 (t, J=7.7 Hz, 5H), 7.26 (t, J=3.0 Hz, 5H), 7.13 (d, J=7.8 Hz, 1H), 7.01 (d, J=7.8 Hz, 1H), 6.73 (s, 1H), 4.63 (s, 2H), 3.17 (s, 2H), 1.54 ppm (s, 9H). ¹³C NMR (126 MHz, CDCl₃) δ 153.06, 144.28, 141.49, 136.85, 130.96, 129.35, 128.18, 126.88, 124.50, 122.23, 120.36, 80.49, 67.17, 65.09, 33.91, 28.41 ppm. This material (9.3 g, 18.19 mmol) was dissolved in 100 mL anhydrous DCM and the solution was cooled to 0° C. Methane Sulfonylchloride (1.8 mL, 23.66 mmol, 1.3 eq) and DIPEA (4.2 mL, 23.66 mmol, 1.3 eq) were added. The reaction stirred under argon at 0° C. for 2 hours. Following completion, the reaction mixture was dilute to 300 mL of DCM, washed twice with Saturated NaHCO₃, then once with brine. The organic layer was dried over magnesium sulfate and volatiles were removed to afford yellow solid 4 (9.42 g, 88% yield). The material was carried forward without further purification. ¹H NMR (500 MHz, CDCl3) δ 7.88 (s, 1H), 7.49 (d, J=7.3 Hz, 5H), 7.34 (t, J=7.7 Hz, 5H), 7.26 (d, J=14.6 Hz, 5H), 7.16 (d, J=7.8 Hz, 1H), 7.04 (d, J=9.5 Hz, 1H), 6.75 (s, 1H), 5.18 (s, 2H), 3.17 (s, 2H), 2.90 (s, 3H), 1.54 ppm (s, 9H). ¹³C NMR (126 MHz, CDCl₃) δ 152.85, 144.14, 137.33, 133.72, 131.28, 129.32, 128.23, 126.97, 126.26, 123.83, 121.95, 80.79, 71.27, 67.32, 38.45, 33.92, 28.40 ppm.

4 (1.06 g, 1.8 mmol) was dissolved in 18 mL dry MeCN. The solution was cooled to 0° C. and tert-Butyl N-Hydroxycarbamate (0.32 g, 2.4 mmol, 1.3 eq) then 1,8-diazabicyclounedec-7-ene (DBU) (0.37 ml, 2.4 mmol, 1.3 eq) were slowly added. The reaction stirred at 0° C. for 1 hour and was then warmed to ambient temperature and stirred under argon overnight. Following completion volatiles were removed and the resulting crude mixture was dissolved in DCM, washed with saturated K₂CO₃ (aq) then with brine. The organic layer was dried over anhydrous MgSO₄ then concentrated afford a yellow oil. The crude material was purified via flash chromatography (silica gel, Hex:EtOAc) to afford a yellow oil (1.005 g, 89% yield). MS-ESI [M+Na]⁺ calculated for C₃₇H₄₂N₂O₅S calculated 649.79 found 649.33; ¹HNMR (400 MHz, CDCl₃) δ 7.81 (s, 1H), 7.49 (d, J=7.6 Hz, 6H), 7.32 (q, J=7.6 Hz, 6H), 7.24 (t, J=7.2 Hz, 3H), 7.13 (d, J=7.6 Hz, 2H), 7.02 (dd, J=8 Hz, 1.6 Hz, 1H), 6.74 (s, 1H), 4.79 (s, 2H), 3.17 (s, 1H), 1.54 (s, 9H), 1.46 ppm (s, 4H); ¹³CNMR (126 MHz, CDCl₃) δ 156.58, 152.91, 144.23, 136.89, 136.29, 130.84, 129.33, 128.16, 126.86, 125.41, 124.17, 122.32, 81.61, 80.46, 77.9, 67.19, 33.95, 28.37, 27.56 ppm. The protected precursor (0.551 g, 0.88 mmol) was dissolved in 9 mL anhydrous DCM. The solution was cooled to 0° C. and triisopropylsilane (TIPS) (0.45 mL, 2.2 mmol) was added followed by the slow addition of 2 mL Trifluoroacetic acid (TFA). The reaction stirred under argon at 0° C. for 30 minutes, then warmed to ambient temperature and concentrated under reduced pressure to afford an off-white solid. This solid was washed with ice-cold hexanes to afford 8 as an off-white solid (0.366 g, quantitative yield). MS-ESI [M+H]⁺ for disulfide C₁₆H₂₂N₄O₂S₂ calculated 367.51 found 367.53. ¹H NMR (500 MHz, D4 MeOH) δ=7.06 (d, J=8 Hz, 1H), 6.76 (d, J=1.5 Hz, 1H), 6.65 (dd, J=8, 1.5 Hz, 1H), 4.56 (s, 2H), 3.69 (s, 2H), 1.38 ppm (s, 1H); ¹³C NMR (126 MHz, D4 MeOH) δ=146.52, 136.56, 130.43, 126.62, 118.90, 117.34, 78.92, 25.84 ppm.

Example 4 Synthesis of an Azide-Containing Protein Labeling Reagent

This example demonstrates the synthesis of a protein labeling reagent of general formula (I) comprising a bioorthogonal azide functional group (—N₃) as R group. According to the methods of the invention, this reagent can be used for linking a target polypeptide to a bioorthogonal azide functionality, which can be used for further coupling a chemical species to the polypeptide using methods know in the art (e.g. via Cu(I)-catalyzed azide/alkyne 1,3-dipolar cycloaddition)

As described in the scheme in FIG. 2, mesylate derivative 4 was reacted with sodium azide to produce the protected intermediate 5. This compound was subsequently deprotected with trifluoroacetic acid in the presence of triisopropylsilane to yield the azide-containing reagent 6.

Experimental Details for Example 4

Compound 4 (2.5 g, 4.24 mmol) and sodium azide (0.56 g, 8.6 mmol) were dissolved in anhydrous DMF (30 mL), and the mixture was stirred under argon at ambient temperature for 12 h. The reaction mixture was then dissolved in CH₂Cl₂ (150 mL) and washed with saturated NaHCO₃ (aq) and with brine. The organic layer was dried over anhydrous MgSO₄, filtered, and concentrated under reduced pressure to afford a yellow oil, which was purified on silica gel with hexanes/EtOAc (1:1) as eluent to afford 5 as a yellow oil (2.3 g, quant.). ¹H NMR (CDCl₃, 400 MHz): δ=7.80 (s, 1H), 7.50 (t, J=4.38 Hz, 6H), 7.34 (t, J=7.64 Hz, 6H), 7.25 (t, J=7.28 Hz, 3H), 7.14 (d, J=7.80 Hz, 1H), 6.94 (dd, J=7.98, 1.70 Hz, 1H), 6.76 (s, 1H), 4.28 (s, 2H), 3.17 (s, 2H), 1.55 ppm (s, 9H); ¹³C NMR (CDCl₃, 126 MHz): δ=152.9, 144.2, 137.2, 135.8, 131.2, 129.3, 128.2, 126.9, 123.2, 121.3, 80.6, 67.2, 54.4, 33.9, 28.4 ppm; MS-ESI: calcd for C₃₂H₃₂N₄O₂S: 559.68 [M+Na]⁺; found: 559.22.

Azide 5 (20 mg, 0.037 mmol) was dissolved in 2 mL anhydrous dichloromethane under Argon. Triisopropylsilane (23.6 uL, 0.117 mmol) was added and the solution was cooled to 0° C. 1 mL Trifluoroacetic acid was added and the reaction mixture was stirred for 20 minutes before being warmed to room temperature and stirred for another 20 minutes. Volatiles were removed under reduced pressure and the resulting solid was washed exhaustively with ice cold hexanes. The resulting yellow oil was collected as trifluoroaceticacetate salt of (2-amino-4-(azidomethyl)phenyl)methanethiol 6 (Quantitative yield) LCMS [M+H]⁺ for disulfide C₁₆H₁₈N₈S₂ calculated 387.50 found 3387.57

Example 5 Synthesis of Additional Oxyamine-Comprising Protein Labeling Reagents

This example further demonstrates the synthesis of protein labeling reagents of general formula (I) comprising a bioorthogonal oxyamine functional group (—ONH₂) as R group. According to the methods described herein, this reagent can be used for functionalizing a target polypeptide with a bioorthogonal oxyamino functionality, which can be used for further coupling a chemical species to the polypeptide via oxime ligation.

As described in the scheme in FIG. 2, azide derivative 5 was reacted with tert-butyl (prop-2-yn-1-yloxy)carbamate via copper catalyzed 1,3-dipolar cyclo-addition. This compound was subsequently deprotected with trifluoroacetic acid in the presence of triisopropylsilane to yield the oxyamino-containing reagent 9. Experimental details for the synthesis of oxyamine-comprising labeling reagents 10A and 10B (FIG. 2) can be found in (Frost, Vitali et al. 2013).

Experimental Details for Example 5

Propargyl bromide (80% by weight in toluene; 1.6 g, 13.44 mmol) was dissolved in dry MeCN (40 mL), and the mixture was cooled to 0° C. tert-Butyl-N-hydroxycarbamate (2.32 g, 17.47 mmol, 1.3 equiv) and DBU (2.61 mL, 17.47 mmol, 1.3 equiv) were added. The reaction mixture was stirred for 20 min at 0° C., then warmed to ambient temperature, and stirred for another 1 h. Volatiles were removed under reduced pressure, and the resulting yellow oil was suspended in CH₂Cl₂, washed twice with saturated NaHCO₃ (aq) and once with brine, then dried over anhydrous MgSO₄. Volatiles were removed under reduced pressure, and the resulting crude material was purified on silica gel (Hexanes/EtOAc 8:1→7:3) to give tert-butyl (prop-2-yn-1-yloxy)carbamate (1.5 g, 65% yield). ¹H NMR (CDCl₃, 400 MHz): δ=7.39 (s, 1H), 4.48 (d, J=2 Hz, 2H), 2.5 (s, 1H), 1.49 ppm (s, 1H); ¹³C NMR (CDCl₃, 126 MHz): δ=156.5, 82.1, 78.3, 75.6, 63.7, 28.2 ppm.

Compounds 5 (0.1 g, 0.186 mmol) and tert-butyl (prop-2-yn-1-yloxy)carbamate (0.127 g, 0.745 mmol, 4 equiv) were dissolved in THF/H₂O (1:1, 6 mL). CuSO₄ (0.045 g, 0.28 mmol, 1.5 equiv) and sodium ascorbate (0.147 g, 0.745 mmol, 4 equiv) were added, and the reaction mixture was stirred at room temperature for 30 min, then dissolved in CH₂Cl₂ and washed twice with concentrated ammonium hydroxide, once with saturated NaHCO₃ (aq), and once with brine, then dried over anhydrous MgSO₄. Volatiles were removed under reduced pressure, and the resulting material was purified on silica gel (hexanes/EtOAc 7:3) to yield a protected precursor (0.094 g, 72% yield). ¹H NMR (CDCl₃, 400 MHz): δ=7.77 (br s, 1H), 7.54 (s, 1H), 7.47 (d, J=4 Hz, 6H), 7.38 (s, 1H), 7.33 (t, J=8 Hz, 6H), 7.26-7.23 (m, 3H), 7.11 (d, J=8 Hz, 1H), 6.68-6.83 (m, 1H), 6.76 (s, 1H), 5.47 (s, 2H), 4.96 (s, 2H), 3.15 (s, 2H), 1.53 (s, 9H), 1.45 ppm (s, 1H); MS-ESI: calculated for C₄₀H₄₅N₅O₅S: 730.87 [M+Na]⁺; found: 730.26.

The protected precursor (0.094 g, 0.133 mmol) was deprotected with TFA in CH₂Cl₂, as described above for 6, to afford 9 (0.065 g, quant.). ¹H NMR (CD₃OD, 500 MHz): δ=8.00 (s, 1H), 7.08 (d, J=8 Hz, 1H), 6.71 (d, J=1.5 Hz, 1H), 6.63 (dd, J=8, 1.5 Hz, 1H), 5.468 (s, 2H), 4.933 (s, 2H), 3.671 ppm (s, 2H); ¹³C NMR (CD₃OD, 126 MHz): δ=146.8, 143.4, 136.4, 131.0, 127.3, 126.0, 118.9, 116.8, 68.9, 54.9, 25.6 ppm; MS-ESI: calculated for disulfide C₂₂H₂₈N₁₀O₂S₂: 529.66 [M+H]⁺; found: 529.18.

Example 6 Synthesis of Coumarin-Comprising Protein Labeling Reagent

This example demonstrates the synthesis of a protein labeling reagent of general formula (I) comprising a fluorescent dye as the R group. According to the methods described herein, this reagent can be used for labeling a target polypeptide with a fluorescent label molecule.

As described in the scheme of FIG. 4, 7-amino-4-(trifluoromethyl)Coumarin 18 was first coupled to N-Boc protected glycine 19. The glycine served as a linker unit and adds an additional amide bond to increase the solubility of the labeling reagent in aqueous buffer. The Boc group on the glycine was removed under acidic conditions to yield a primary amine (compound 21). This intermediate was then coupled to the carboxylic acid functionalized protected intermediate 7 (FIG. 2) to yield the protected intermediate 22. This compound was then deprotected under acidic conditions to yield the fluorescent labeling reagent 23.

Experimental Details for Example 6

7-amino-4-(trifluoromethyl)coumarin 18 (550 mg, 2.4 mmol) and N-Boc-glycine 19 (462.5 mg, 2.64 mmol) were dissolved in 9 mL dry pyridine and the solution was cooled to −15° C. Phosphoryl chloride (245 uL, 2.64 mmol) was slowly added drop wise and the solution was stirred at −15° C. for 1 hour. The reaction mixture was poured into water and extracted with EtOAc. The EtOAc layer was washed once with 10% aqueous citric acid, once with aqueous saturated sodium bicarbonate and once with brine. The organic layer was dried over anhydrous Magnesium Sulfate, filtered, filtered and concentrated under reduced pressure. The crude product was purified by flash column chromatography (silica gel, Hex:EtOAc) to yield 291 mg of desired product 20 (31%) MS (ESI) [2M+Na]⁺ calculated: 795.72, observed: 796.25.

Product 20 (291 mg, 0.75 mmol) was dissolved in 4 mL anhydrous Dichloromethane and cooled to 0° C. Trifluoroacetic acid (2 mL) was slowly added to the reaction mixture and the solution was stirred at 0° C. for 30 minutes. The reaction mixture was warmed to room temperature and concentrated in vaccuo then re suspended in dichloromethane. The organic layer was washed once with saturated potassium carbonate then once with brine, dried over magnesium sulfate, filtered and concentrated in vaccuo to yield 21 (200 mg, 93%) This material was used without further purification.

Amine 21 (70.78 mg, 0.247 mmol) and the carboxylic acid 7 (100 mg, 0.19 mmol) were dissolved in 2 mL dry DMF. To that solution was added HBTU (108.08 mg, 0.285 mmol) then Triethylamine (39 uL, 0.285 mmol). The reaction mixture was stirred for 18 hr at room temperature then dissolved in Ethyl Acetate and washed once with saturated aqueous ammonium chloride, once with saturated aqueous Sodium Bicarbonate, and once with brine then dried over anhydrous magnesium sulfate, filtered, and concentrated in vaccuo. The crude material was chromatographed on silica gel (Hex:EtOAc) and the resulting material was loaded on a silica plug and eluted with a mixture of 70% dichloromethane, 24% chloroform, 5.4% methanol and 0.6% ammonium hydroxide. Volatiles were removed in vaccuo to yield protected Coumarin-containing reagent 22 (24 mg, 16%). MS (ESI) [M+Na]⁺ calculated: 816.8, observed: 816.14

22 (24 mg, 0.03 mmol) was dissolved in 0.7 mL of anhydrous CH₂Cl₂ and the solution was cooled to 0° C. Triisopropylsilane (18.2 uL, 0.09 mmol) was added followed by the drop wise addition of 300 uL trifluoroacetic acid. The reaction mixture was stirred at 0° C. for 30 min then warmed to room temperature. Volatiles were removed under reduced pressure and the material was washed with ice cold hexanes to yield yellow solid 23 (quantitative yield). LCMS [M+H]⁺ for disulfide C₄₀H₃₀F₆N₆O₈S₂ calculated 901.82 found 901.56

Example 7 Synthesis of Biotin-Comprising Protein Labeling Reagent

This example demonstrates the synthesis of a protein labeling reagent of general formula (I) comprising a biotin affinity tag as the R group. According to the methods described herein, this reagent can be applied for labeling a target polypeptide with an affinity tag molecule to enable the isolation/immobilization of the polypeptide via affinity chromatography/capturing using, for example, streptavidin-functionalized solid supports.

As described in the scheme in FIG. 5, 1,3-diamino-propane was first coupled to the carboxylic acid functionalized intermediate 7 (FIG. 2) to add a liker to the latter. Biotin was then coupled to the amine intermediate 24 to yield the protected product 25. This compound was then de-protected to yield the biotin-containing protein labeling reagent 26.

Experimental Details for Example 7

Carboxylic acid 7 (300 mg, 0.57 mmol) was dissolved in 6 mL anhydrous Dichloromethane. To that solution was added HBTU (324.6 mg, 0.856 mmol) then Triethylamine (196 uL, 1.43 mmol). The reaction mixture was cooled to 0° C. and stirred for 30 min. The solution was warmed to room temperature and 1,3-propane diamine (422.5 mg, 5.7 mmol) were added. The reaction mixture was stirred at room temperature for 16 hours then dilute in Dichloromethane and washed twice with saturated aqueous sodium bicarbonate and once with brine then dried over anhydrous Magnesium Sulfate, filtered and concentrated. The resulting crude material was chromatographed on silica gel (70% dichloromethane, 24% chloroform, 5.4% methanol and 0.6% ammonium hydroxide) to yield 24 (75 mg, 23%). MS (ESI) [M+H]⁺ calculated: 582.3, observed: 582.29.

24 (75 mg, 0.129 mmol) was dissolved in 1.5 mL dry DMF. Biotin (41 mg, 0.167 mmol) was added followed by HBTU (74 mg, 0.19 mmol) and Triethylamine (26.7 uL, 0.19 mmol) and the reaction stirred at room temperature for 8 hours. Following completion the reaction mixture was dissolved in dichloromethane and washed once with water, once with saturated aqueous sodium bicarbonate and once with brine then dried over anhydrous magnesium sulfate, filtered, and concentrated under reduced pressure. The crude mixture was chromatographed on silica gel using a mixture of 70% Dichloromethane, 24% Chloroform, 5.4% methanol and 0.6% ammonium hydroxide to yield 25 (40 mg, 38%) MS (ESI) [M+H]⁺ calculated: 808.35, observed: 808.1.

25 (40 mg, 0.05 mmol) was dissolved in 0.7 mL of anhydrous CH₂Cl₂ and the solution was cooled to 0° C. Triisopropylsilane (30 uL, 0.15 mmol) was added followed by the drop wise addition of 300 uL trifluoroacetic acid. The reaction mixture was stirred at 0° C. for 30 min then warmed to room temperature. Volatiles were removed under reduced pressure and the material was washed with ice cold hexanes to yield yellow solid 26 (quantitative yield). LCMS [M+H]⁺ for C₂₁H₃₁N₅O₃S₂ calculated 466.63 found 466.24

Example 8 Synthesis of N-(2-mercaptoethyl)-amino-aryl-based reagents

This example demonstrates the synthesis of a synthetic intermediate useful for the generation of N-(2-mercaptoethyl)-amino-aryl-based reagents for protein/peptide functionalization of the type (V)-(VIII) according to the methods described herein. In particular, the synthesis of a reagent of the type of compounds of general formula (V) is demonstrated. As shown in Examples 1 and 2, it is understood that similar synthetic procedures as those described in the present example can be applied for preparing other regioisomers of the reagent of type (V), such as reagents of general formula (VI), (VII) and (VIII).

As described in the scheme in FIG. 6, aniline 27 or meta-methyl aniline 28 were converted to the target molecules 30 and 34 respectively in three steps each. Introduction of a chloroethyl functionality was achieved through reductive amination of the aniline precursor with α-chloroacetaldehyde in the presence of sodium cyanoborohydride. Chloride precursors were reacted with potassium thiocyanate and the cyano group was removed with lithium aluminum hydride to generate amino thiol reagents of general formula (V).

Experimental Details for Example 8

Aniline 27 (0.2 g, 2.1 mmol) was dissolved in 10 mL ethanol. To this was added acetic acid (0.126 g, 2.1 mmol) and Sodium Cyanoborohydride (0.264 g, 4.2 mmol). α-Chloroacetaldehyde (0.181 g, 2.31 mmol) was added and the reaction stirred at room temperature for 40 minutes. The reaction was quenched by the addition of cold water and taken up in 100 mL dichloromethane. The organic layer was washed one with water then once with brine, dried over magnesium sulfate, and volatiles were removed under reduced pressure to yield crude 28 (0.327 g, 85% crude yield). This material was carried forward without further purification.

Crude 28 (277 mg, 1.8 mmol) was dissolved in 10 mL anhydrous DMF. To this solution was added Potassium thiocyanate (0.35 g, 3.6 mmol) and the reaction mixture was heated to 80° C. under argon for 12 hr. The reaction mixture was concentrated under reduced pressure and chromatographed in Hexanes:Ethyl Acetate (8:1 to 7:3 gradient) to yield protected precursor 29 (0.1418 g, 44%)

Precursor 29 (0.1418 g, 0.78 mmol) was dissolved in 10 mL anhydrous diethylether and the reaction mixture was cooled to 0° C. A 1.0M solution of lithium aluminum hydride in tetrahydrofuran (0.78 mL) was slowly added. The reaction was stirred at 0° C. for 30 minutes then warmed to room temperature. The reaction mixture was quenched by the slow drop wise addition of 0.1 mL cold water, dried over Magnesium sulfate and filtered through a celite pad to afford product 30 (0.078 g, 65%) ¹H NMR (CDCl₃, 400 MHz): δ=7.20-7.15 (m, 2H), 6.77-6.71 (m, 1H), 6.69-6.67 (d, J=7.6 Hz, 1H), 3.33 (t, J=12.8 Hz, 2H), 2.75 ppm (dd, J=12.8, 6.4 Hz, 2H).

3-methylaniline 31 (0.5 mL, 4.67 mmol) was dissolved in 25 mL ethanol. To this was added acetic acid (0.267 mL, 4.67 mmol) and Sodium Cyanoborohydride (0.323 g, 5.13 mmol). α-Chloroacetaldehyde (0.9 mL, 5.137 mmol) was added and the reaction stirred at room temperature for 4 hours. The reaction was quenched by the addition of cold water and taken up in 100 mL dichloromethane. The organic layer was washed one with water then once with brine, dried over magnesium sulfate, and volatiles were removed under reduced pressure to yield crude 32. This product was chromatographed on silica gel (7:3 Hex:EtOAc) to yield pure 32 (0.78 g, quantitative).

Chloride 32 (0.78 mg, 4.6 mmol) was dissolved in 20 mL anhydrous DMF. To this solution was added Potassium thiocyanate (2.07 g, 21.3 mmol) and the reaction mixture was heated to 80° C. under argon for 12 hr. The reaction mixture was concentrated under reduced pressure and chromatographed in Hexanes:Ethyl Acetate (8:1 to 7:3 gradient) to yield protected precursor 33 (0.493 g, 55.6%)

Precursor 33 (0.493 g, 2.56 mmol) was dissolved in 22 mL anhydrous diethylether and the reaction mixture was cooled to 0° C. A 1.0M solution of lithium aluminum hydride in tetrahydrofuran (2.56 mL) was slowly added. The reaction was stirred at 0° C. for 30 minutes then warmed to room temperature. The reaction mixture was quenched by the slow drop wise addition of 1 mL cold water, dried over Magnesium sulfate and filtered through a celite pad to afford product 34 (0.22 g, 51%). ¹H NMR (CDCl₃, 500 MHz): δ=7.073 (t, J=8 Hz, 1H), 6.56 (d, J=7.5 Hz, 1H), 6.45 (d, J=8.5 Hz, 2H), 3.34 (t, J=6.5 Hz, 2H), 2.76 (q, J=6.5 Hz, 2H), 2.28 (s, 3H), 1.40 ppm (t, J=8 Hz, 1H). LCMS [M+H]⁺ for C₉H₁₃NS calculated 168.27 found 168.29

Example 9 Preparation of C-Terminal Thioester Proteins via Intein Fusion

This example demonstrates the construction, production, and isolation of precursor polypeptides comprising a reactive C-terminal thioester group. In particular, this example demonstrates the generation a recombinant target polypeptide which comprises a C-terminal thioester group generated by genetic fusion of the polypeptide to the N-terminus of an engineered intein.

For these experiments, the 68-amino acid Chitin-Binding Domain (CBD) of chitinase A1 from Bacillus circulans was used as a model target polypeptide. Three different precursor polypeptide constructs, named CBD-1, CBD-2, and CBD-3 (Table 1), were prepared the fusing the gene encoding for CBD to the N-terminus of an engineered variant (N198A) of intein GyrA from Mycobacterium xenopi. The C-terminal asparagine of intein GyrA was mutated to an alanine (N198A) to prevent C-terminal splicing of the intein and to allow for the introduction of a polyhistine (His₆) tag at the C-terminus of the intein. To produce the precursor proteins prior to the protein labeling reaction according to the methods described herein, the protein constructs were expressed in E. coli cells. For the in vitro protein labeling experiments, the proteins were purified using Ni-affinity chromatography and their identity confirmed by MALDI-TOF. For the protein labeling experiments in cell lysate, cell lysate of E. coli cells expressing the CDB-intein fusion protein was used. For the in vivo labeling experiments, E. coli cells expressing the CDB-intein fusion protein were used.

TABLE 1 Name Target polypeptide Intein C-terminal tag CBD- 1 Chitin-binding domain-RHG Mxe GyrA LEHHHHHH (OpgY)TGSGT- (N198A) SEQ ID NO: 85 CBD-2 Chitin-binding domain-RHG Mxe GyrA LEHHHHHH (pAcF)TGSGT- (N198A) SEQ ID NO: 85 CBD-3 Chitin-binding domain-GSGY- Mxe GyrA LEHHHHHH (N198A) SEQ ID NO: 85 The Chitin-Binding Domain (also indicated as ‘CBD’) corresponds to:

(SEQ ID NO: 86) MKIEEGKLTNPGVSAWQVNTAYTAGQLVTYNGKTYKCLQPHTSLAGWEPS NVPALWQLQNNGNNGLEL Further experimental details for the cloning, recombinant expression, and purification of the CBD-intein fusion constructs can be found in (Smith, Vitali et al. 2011) and in (Satyanarayana, Vitali et al. 2012).

Example 10 Analysis of Rate and Efficiency of Protein Functionalization with Reagents of General Formulas (I) and (II)

This example demonstrates how a target protein can be chemo-selectively functionalized using reagents of general formula (I) and (II). In particular, this example illustrates the fast kinetics and high efficiency of protein functionalization using amino-thiol reagents of this type.

For these experiments, the intein-fusion protein CBD-3 (Table 1) was used as the precursor target polypeptide and compound 11 (FIG. 2) and compound 17 (FIG. 3) were used as examples of reagents of general formula (I) and (II), respectively. These protein labeling experiments (FIG. 7A) were performed by adding compound 11 and compound 17 at different concentrations (1, 5, and 15 mM) to a solution of CBD-3 protein (100 μM) in potassium phosphate buffer (50 mM potassium phosphate, 150 mM sodium chloride, pH 7.5). The reducing agent TCEP (20 mM) was also added to the solution to prevent thiol oxidation in the reagent and/or in the protein. The reactions were analyzed by MALDI-TOF MS analysis at 24 hours. As shown in FIG. 7C, these analyses showed the clean formation of the desired functionalized protein products, CBD-11 and CBD-17, respectively, with masses corresponding to the expected ones (CBD-11: calculated [M+H]⁺ m/z: 7976.92; observed [M+H]⁺ m/z: 7977.13; CBD-17: calculated [M+H]⁺ m/z: 7976.92; observed [M+H]⁺ m/z: 7976.33). Identical results were obtained for all the reagent concentrations tested, indicating successful functionalization of the target protein with both 11 and 17 even at the lowest reagent concentration tested (1 mM for 11 and 5 mM for 17). To measure the kinetics of these reactions, the samples were analyzed by SDS-PAGE gel densitometry at different time points (1, 2, 3, 6, 12, 24 hours). In this reactions, functionalization of the target CBD protein occurs with splicing of the precursor polypeptide (30 kDa) to give the functionalized protein (8 kDa) and spliced intein (22 KDa). Thus, the amount of functionalized protein over time can be quantified via densitometric analysis of the corresponding bands in the SDS-PAGE gel. As summarized in FIG. 7A-C, these experiments demonstrated the fast kinetics of protein functionalization with both reagents and in particular with reagent 11. In the presence of latter, over 50% and 80% labeled protein was obtained after only 3 hours at 5 and 15 mM reagent concentrations. In both cases, nearly quantitative functionalization of the target protein was observed after 12 hours. Compared to 11, reagent 17 exhibited somewhat slower rates of protein functionalization, with quantitative yields being achieved after 24 hours (FIG. 7B). Notably, for all the reactions and all the time points, only the desired product was observed by MALDI-TOF MS. Altogether, these results demonstrate the fast kinetics and high efficiency of protein labeling achievable with reagents of general formula (I) and general formula (II).

Example 11 Functionalization of a Target Protein with a Bioorthogonal Oxyamino Functional Group

This example demonstrates how the methods described herein can be used for introducing a non-proteinogenic, bioorthogonal functional group into a target polypeptide. In particular, this example shows how these methods can be used for functionalizing a recombinant protein with a bioorthogonal oxyamino (—ONH₂) group. The oxyamine-functionalized protein can then be used to further couple the target polypeptide with another chemical species or to a solid support via methods known in the art (e.g. via oxime ligation of the oxyamine-functionalized protein with a chemical species or solid support functionalized with oxyamine-reactive functional group such as a ketone, aldehyde, or α-keto-acid group).

For these experiments, the intein-fusion protein CBD-1 (Table 1) was used as the precursor target polypeptide and reagent 8 (FIG. 2) was used as an example of a reagent of general formula (I) comprising a bioorthogonal oxyamine (—ONH₂) as the R group. The protein labeling reaction (FIG. 11A) was carried out by adding reagent 8 (10 mM) to a solution of purified CBD-1 (100 μM) in phosphate buffer (50 mM, pH 7.5). The extent of protein labeling over time was determined as SDS-PAGE densitometric analysis as described above and formation of the desired oxyamine-functionalized protein was confirmed by MALDI-TOF MS. These experiments show that about 40% and over 60% of the target protein was functionalized after 2 and 5 hours, respectively (FIG. 11B). Also in this case, the desired functionalized protein, CBD-8, was the only product formed in the reaction as determined by MALDI-TOF MS analysis (FIG. 11C).

In another experiment, the intein-fusion protein CDB-3 was made react with different concentrations (1, 5, 15 mM) of the oxyamine-comprising reagent 9 (FIG. 2) under identical conditions as indicated above (100 μM protein, 20 mM TCEP, KPi buffer (pH 7.5), room temperature) (FIG. 12A). These experiments showed clean formation of a single product corresponding to the desired CBD-9 conjugate at all the reagent concentrations tested (CBD-9: calculated [M+H]+ m/z: 8059.03; observed [M+H]+ m/z: 8058.68), as shown by the representative MALDI-TOF MS spectrum provided in FIG. 12C. In addition, even faster protein labeling kinetics were observed for reagent 9 as compared to 8, as summarized in the graph of FIG. 12B. For example, over 85% of protein labeling was achieved with 9 at 1 mM in only 6 hours, whereas nearly quantitative (90-98%) labeling of the target protein was achieved at higher reagent concentration (5 and 15 mM) within only 3 hours (FIG. 12B). Similar results as those observed with 9 were obtained with reagents 10A and 10B (FIG. 2).

Altogether, these results demonstrate the usefulness and efficiency of the methods described herein for labeling a precursor protein with a bio-orthogonal functional group under mild and catalyst-free (i.e. thiol free) reaction conditions. These experiments also show how different linker units can be used to link the reactive 1-amino-2-mercaptomethyl-aryl moiety to a desired R group (—ONH₂ group). The different linker units can be useful to improve the physico-chemical properties of the reagents such as their water-solubility and/or varying the spacing distance between the R group and the reactive amino-thiol moiety, according to the specific needs for a given application. For example, the triazole-based linker in reagent 9 improves the water solubility and provides a larger spacing distance between the oxyamino group and the 1-amino-2-mercaptomethyl-aryl moiety as compared to reagent 8.

Example 12 Functionalization of a Target Protein with a Bioorthogonal Azide Functional Group

This example provides another demonstration of how the methods described herein can be used for introducing a non-proteinogenic, bioorthogonal functional group into a target polypeptide. In particular, this example shows how these methods can be used for functionalizing a recombinant protein with a bioorthogonal azido (—N₃) group. The azide-functionalized protein can then be used to further couple the target polypeptide with another chemical species or to a solid support via methods known in the art (e.g. via Cu(I)-catalyzed ligation of the azide-functionalized protein with a chemical species or solid support containing an alkyne functional group).

For these experiments, the intein-fusion protein CBD-3 (Table 1) was used as the precursor target polypeptide and reagent 6 (FIG. 2) was used as an example of a reagent of general formula (I) comprising a bioorthogonal azido group (—N₃) as the R group. The protein labeling reaction was carried out as described in Example 11 by adding reagent 6 (10 mM) to a solution of purified CBD-3 (100 μM) in phosphate buffer (50 mM, pH 7.5). MALDI-TOF MS analysis confirmed the formation of the desired azide-functionalized protein, CBD-6 (calculated: [M+H]+ m/z: 7988.96 observed: [M+H]+ m/z: 7988.72), demonstrating the efficiency of the method toward C-terminal labeling of a protein with a bioorthogonal azide functionality.

Example 13 Protein Labeling with a Fluorescent Probe

This example demonstrates how the methods described herein can be applied for labeling a target protein with a fluorophore molecule. In particular, this example illustrates an embodiment of the invention wherein a coumarin-comprising reagent of general formula (I) was used for covalently linking a fluorescent dye to a protein.

As schematically indicated in FIG. 8A, these protein labeling studies were performed by reacting the coumarin-comprising reagent 23 (FIG. 4) (15 mM) with the intein-fusion protein CBD-2 (100 μM) in potassium phosphate buffer (50 mM KPi, 150 mM NaCl, pH 7.5). TCEP (20 mM) was also added to the solution to prevent thiol oxidation in the reagent or in the protein. As described above, the reactions were analyzed by densitometric analysis of SDS-PAGE gels to measure the extent of protein functionalized and by MALDI-TOF MS to confirm the formation of the desired product. MS analyses revealed the formation of the desired coumarin-functionalized protein, CBD-23 (calculated [M+H]⁺ m/z: 8823.72 Observed [M+H]⁺ m/z: 8823.07), as the only product at all the time points tested (1, 5, 12 hours), as indicated by the representative MALDI-TOF spectrum in FIG. 8C. According to SDS-PAGE gel densitometry, the percentage of protein labeling (i.e. percentage of CBD-23 formed) after 1 and 12 hours was estimated to be 40% and 60%, respectively (FIG. 8B, left panel). To further confirm the occurrence of protein labeling with the fluorescent probe, the protein gel was visualized under a fluorescence detector (λ_(ex): 365 nm). As shown by the fluorescent imaging gel in FIG. 8B (right panel), this analysis revealed the occurrence of fluorescence only in correspondence to the CBD band, confirming the selective labeling of the target protein with the fluorescent probe. Altogether, these results demonstrate the usefulness and efficiency of the methods described herein for tagging a precursor protein with a fluorescent probe under mild, physiologically relevant reaction conditions. In addition, they demonstrate how this protein functionalization procedure could be carried out without the need for exogenous thiol catalysts. Finally, these results demonstrate how the functionalized protein product could be selectively visualized via fluorescence imaging.

Example 14 Protein Labeling with a Biotin Affinity Tag

This example demonstrates how the methods described herein can be applied for labeling a target protein with an affinity tag molecule. In particular, this example illustrates an embodiment of the invention wherein a biotin-comprising reagent of general formula (I) was used for covalently linking the affinity tag biotin to a protein.

Under standard reaction conditions (50 mM potassium phosphate (pH 7.5), 150 mM sodium chloride, 20 mM TCEP; FIGS. 9A and 13A), the biotin-comprising reagent 26 (FIG. 6) was added to a solution of CBD-3 (100 μM) at different concentrations (1, 5, and 15 mM). As illustrated by the representative MALDI-TOF MS spectra in FIGS. 9C and 13C, the desired functionalized product, CBD-26, was obtained as the only product. The kinetics of these protein labeling reactions were then investigated by measuring the extend of protein labeling over time by SDS-PAGE analysis as described above. As summarized in FIGS. 9B and 13B, these experiments show fast and efficient functionalization of the target protein with the biotinylating reagent within short time. For example, nearly quantitative labeling was achieved in the presence of 5 mM 26 within 6 hours.

In another experiment, a different intein-fusion construct, i.e. CBD-2 (100 μM), was made react with the biotinylating reagent 26 (15 mM) under standard reaction conditions. Also in this case, clean formation of a single product corresponding to the expected mass of the CBD-26 conjugate (calculated [M+H]⁺ m/z: 8837.93; observed [M+H]⁺ m/z: 8837.96) was observed at each time point tested (1, 5, 12 hours). Based on SDS-PAGE densitometric analysis, the amount of protein labeling after 1 and 12 hours was determined to be about 50% and >70%, respectively (FIG. 10).

Altogether, these results demonstrate the efficiency of the methods described herein for labeling a target protein with an affinity tag molecule under mild, physiologically relevant conditions and without the need for an exogenous thiol catalysts.

Example 15 Labeling of a Target Protein in Cell Lysate and Isolation of the Functionalized Protein by Affinity Chromatography

This example demonstrates how the methods described herein can be used for labeling a target protein in a complex biologically-derived medium such as a cell lysate. In particular, this example shows how a target protein can be labeled with a fluorescent label molecule (coumarin) or an affinity label molecule (biotin) in a complex biological sample. The example further demonstrates how this procedure can be useful for isolating the biotinylated target protein from the complex mixture via biotin affinity capturing.

A cell lysate of E. coli cells expressing the intein-fusion construct CBD-2 was prepared by resuspending the cells from a 25 mL-culture in 1 mL of 50 mM potassium phosphate buffer (pH 7.5) followed by sonication and centrifugation at 13,000 rpm for 30 minutes. 300 uL of cell lysate sample was then added with either reagent 23 or reagent 26 (15 mM). After 6 hour incubation at room temperature, the sample containing reagent 23 was passed through 100 uL chitin beads. After washing the beads with phosphate buffer, the chitin-bound material was eluted with 100 uL 75% acetonitrile in water. MALDI-TOF MS analysis of the eluate revealed the occurrence of desired ligation product (CBD-23) as the only product (calculated [M+H]⁺ m/z: 8823.72; observed [M+H]⁺ m/z: 8823.6). After 6 hour incubation at room temperature, the sample containing reagent 26 was passed through 300 uL of streptavidin beads. After washing the beads with phosphate buffer, the streptavidin-bound material was eluted with 250 uL 75% acetonitrile in water. MALDI-TOF MS analysis of the eluate revealed the occurrence of desired ligation product (CBD-26) as the only product (calculated [M+H]⁺ m/z: 8837.93; observed [M+H]⁺ m/z: 8837.58). Overall, these results demonstrate the functionality and utility of the methods described herein for selective labeling of intein-fused target protein in a complex biological system, which further proves the chemo- and site-selectivity and bioorthogonal nature of these protein labeling procedures. They also show the utility of these methods in providing a way to label a target protein with an affinity tag so that this protein can be rapidly isolated from a complex mixture.

Example 16 Labeling of a Target Protein with a Bioorthogonal Functional Group in Cell Lysate

This example provides a demonstration of how the methods described herein can be used for labeling a target protein with a bio-orthogonal functional group in the form of a oxyamino group (—ONH₂) in a complex biologically-derived medium such as a cell lysate.

A cell lysate of E. coli cells expressing the intein-fusion construct CBD-3 was prepared by resuspending the cells from a 25 mL-culture in 1 mL of 50 mM potassium phosphate buffer (pH 7.5) followed by sonication and centrifugation at 13,000 rpm for 30 minutes. 300 uL of cell lysate sample was added with either reagent 9 or reagent 26 (10 mM). After 5 hour incubation at room temperature, both reactions were analyzed by MALDI-TOF MS. As shown in FIG. 14, these analyses revealed the occurrence of the desired ligation products CBD-9 and CBD-26 as the only ligation products (CBD-9: calculated [M+H]⁺ m/z: 8059.03; observed [M+H]⁺ m/z: 8058.35; CBD-26 calculated [M+H]⁺ m/z: 8259.33; observed [M+H]⁺ m/z: 8295.32). Overall, these results demonstrate the functionality and utility of the methods described herein for selective labeling a recombinant target protein in a complex biological system.

Example 17 Protein Labeling in Living Cells

This example demonstrates how the methods described herein can be used to selectively functionalize a target protein inside a living cell. In particular, this example shows how these methods can be used to label a target protein with a biotin affinity tag molecule inside a bacterial cell and how the functionalized protein can then be isolated by affinity chromatography.

25 mL cultures of E. coli cells expressing the intein-fusion protein CBD-3 (Table 1) were harvested by centrifugation at 4,000 rpm for 20 minutes. The cell pellets were then resuspended in 1 mL of 50 mM potassium phosphate buffer (pH 7.5) supplemented with compound 26 at either 5 mM or 10 mM in the presence of TCEP (15 mM). After 8 hours of incubation at room temperature, the cells were harvested by centrifugation and the cell pellets were extensively washed with buffer. The cell pellets were then resuspended in 1 mL of phosphate buffer, lysed by sonication, and the cell lysate was clarified via centrifugation. The cell lysates were analyzed by MALDI-TOF. As shown in FIG. 15A-B, these analyses revealed the presence of the desired ligation product CBD-26 (calculated [M+H]⁺ m/z: 8259.33; observed [M+H]⁺ m/z: 8259.32) at both reagent concentrations. In each case, a small amount of CBD-COOH was also observed, this species likely resulting from spontaneous hydrolysis of the intein-fusion product during expression. To further confirm the formation of the desired biotin-protein conjugate, the cell lysates were passed over streptavidin-functionalized polyacrylamide beads. After washing with buffer, the beads were resuspended in 50:50 acetonitrile:water to elute the strepatavidin-bound material. MALDI-TOF MS analysis of the eluate revealed the occurrence of a single species with a mass corresponding to the desired biotinylated protein, CBD-6 (calculated [M+H]⁺ m/z: 8259.33; observed [M+H]⁺ m/z: 8259.116; FIG. 16A-C, Graphic C).

Overall, these results demonstrated the selective functionalization of a target intein-fusion protein inside a living cell using the methods described herein. They also show how, after in vivo labeling of the target protein with a biotin affinity tag, the product of the functionalization reaction can be isolated via affinity chromatography. Furthermore, since during the biotin capturing process the functionalized (i.e. biotinylated) target protein is immobilized on the streptavidin-coated resin beads via a tight biotin-streptavidin complex, these experiments also show how the methods described herein can be used to immobilize a target protein onto a solid support.

Example 18 Fluorescent Tagging of a Target Protein via Bifunctional Labeling Reagents

This example demonstrates how the methods described herein can be used for introducing a reactive functional group into a target polypeptide so that the functionalized protein can be further modified with a chemical species of interest such as a fluorescent probe molecule. In particular, it shows how the methods described herein can be used for the preparation of an oxyamine-functionalized target protein which can then be further modified with a coumarin-based fluorescent probe via an oxime ligation between the oxyamino group introduced into the protein and the keto group in the coumarin dye.

Under standard reaction conditions (100 uM protein, 50 mM potassium phosphate (pH 7.5), 150 mM sodium chloride, 20 mM TCEP), the intein-fusion protein CBD-3 (Table 1) was first incubated with reagent 9 (FIG. 2) at a concentration of 1 mM for 5 hours at room temperature. Then, 3-acetyl-coumarin (10 mM) was added. After adjusting the pH to 5, the reaction mixture was incubated for 12 hours at room temperature and then analyzed by MALDI-TOF MS. These analyses revealed the formation of the desired CBD-9-coumarin conjugate (calculated: [M+H]+ m/z: 8229.2 observed: [M+H]+ m/z: 8228.12).

Example 19 In vitro protein functionalization with N-(2-mercaptoethyl)-amino-aryl-based reagent

This example demonstrates how the general strategy schematically illustrated in FIG. 1 can be applied for labeling of a target protein. In particular, this example illustrates an embodiment of the invention wherein a reagent of general formula (V) is used to functionalize a target protein in vitro.

For these studies, the intein-fusion protein CBD-2 was incubated with reagent 30 (FIG. 6) at 15 mM under standard reaction conditions (100 uM protein, 50 mM potassium phosphate (pH 7.5), 150 mM sodium chloride, 20 mM TCEP). At different time points, the reaction mixture was analyzed by MALDI-TOF MS to monitor product formation and by SDS-PAGE for measuring the extent of protein labeling. MALDI-TOF MS analysis revealed the formation of the desired CBD-2-30 ligation adduct as the only observable product (calculated: [M+H]+ m/z: 8525.54 observed: [M+H]+ m/z: 8525.7). To assess the occurrence of an S,N acyl transfer in the functionalized protein adduct, iodoacetamide (20 mM) was added to the reaction mixture. At the 4 hour time point, 90% of the functionalized protein adduct was converted to the corresponding S-alkylated product (calculated: [M+H]+ m/z: 8582.54 observed: [M+H]+ m/z: 8582.99), confirming the occurrence of the desired S,N acyl transfer (i.e., rearrangement of thioester ligation product ‘a’ into the amide ligation product ‘b’ in FIG. 1). To measure the extent of protein labeling, the samples were also analyzed by SDS-PAGE followed by gel densitometry. These studies showed that the occurrence of 30-induced splicing of the precursor protein and indicated the occurrence of as much as 70% labeling of the target protein after 4 hours (FIG. 17A-B). Altogether, these results demonstrate the functionality of reagents of the type (V)-(VIII) for functionalization of a protein of interest in vitro according to the general strategy of FIG. 1.

Example 20 In vivo protein functionalization with N-(2-mercaptoethyl)-amino-aryl-based reagent

This example further demonstrates how the general strategy schematically illustrated in FIG. 1 can be applied for labeling of a target protein. In particular, this example illustrates another embodiment of the invention wherein a reagent of general formula (V) is used to functionalize a target protein inside a living cell.

For these experiments, a 25 mL culture of E. coli cells expressing the intein-fusion protein CBD-3 (Table 1) was harvested by centrifugation at 4,000 rpm for 20 minutes. The cell pellet was then resuspended in 1 mL of 50 mM potassium phosphate buffer (pH 7.5) supplemented 10 mM of compound 34 and 15 mM TCEP. After 8 hour-incubation at room temperature, the cells were harvested by centrifugation and the cell pellet was extensively washed with buffer. The cell pellet was then resuspended in 1 mL of phosphate buffer, lysed by sonication, and the cell lysate was clarified via centrifugation. As shown in FIG. 18A-B, MALDI-TOF MS analysis of the cell lysate revealed the presence of desired ligation product CBD-34 (calculated [M+H]⁺ m/z: 7960.7; observed [M+H]⁺ m/z: 7960.7) in addition to a small amount of CBD-COOH, likely resulting from spontaneous hydrolysis of the intein-fusion product during expression. Altogether, these results demonstrate the functionality of reagents of the type (V)-(VIII) for functionalization of a protein of interest inside a cell according to the general strategy of FIG. 1.

REFERENCES

-   Calloway, N. T., M. Choob, et al. (2007). Chembiochem 8(7): 767-774. -   Chattopadhaya, S., F. B. Abu Bakar, et al. (2009). Methods Enzymol     462: 195-223. -   Chen, I., M. Howarth, et al. (2005). Nat Methods 2(2): 99-104. -   Cohen, J. D., P. Zou, et al. (2012). Chembiochem 13(6): 888-894. -   Crivat, G. and J. W. Taraska (2012). Trends Biotechnol 30(1): 8-16. -   Frost, J. R., F. Vitali, et al. (2013). Chembiochem 14(1): 147-160. -   Hermanson, G. T. (1996). Bioconjugate Techniques. San Diego,     Academic Press. -   Jing, C. and V. W. Cornish (2011). Acc Chem Res 44(9): 784-792. -   Keppler, A., S. Gendreizig, et al. (2003). Nature Biotechnology     21(1): 86-89. -   Los, G. V., L. P. Encell, et al. (2008). Acs Chemical Biology 3(6):     373-382. -   Muir, T. W., D. Sondhi, et al. (1998). Proc Natl Acad Sci U S A     95(12): 6705-6710. -   Paulus, H. (2000). Annu Rev Biochem 69: 447-496. -   Popp, M. W., J. M. Antos, et al. (2007). Nat Chem Biol 3(11):     707-708. -   Satyanarayana, M., F. Vitali, et al. (2012). Chemical Communications     48(10): 1461-1463. -   Shin, Y., K. A. Winans, et al. (1999). J Am Chem Soc 121(50):     11684-11689. -   Smith, J. M., F. Vitali, et al. (2011). Angew Chem Int Ed Engl     50(22): 5075-5080. -   Yin, J., F. Liu, et al. (2004). J Am Chem Soc 126(25): 7754-7755.

The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are intended to fall within the scope of the appended claims.

While embodiments of the present disclosure have been particularly shown and described with reference to certain examples and features, it will be understood by one skilled in the art that various changes in detail may be effected therein without departing from the spirit and scope of the present disclosure as defined by claims that can be supported by the written description and drawings. Further, where exemplary embodiments are described with reference to a certain number of elements it will be understood that the exemplary embodiments can be practiced utilizing either less than or more than the certain number of elements.

All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.

The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. 

what is claimed is:
 1. A method for forming a covalent linkage between a polypeptide and a chemical species, the method comprising the steps of: a) providing a polypeptide, wherein the polypeptide comprises a thioester group and/or wherein the polypeptide is C-terminally fused to an intein; b) providing a chemical reagent of formula (I), (II), (III), (IV), (V), (VI), (VII) or (VIII):

or a salt of the chemical reagent, wherein: i) R is a chemical species to be covalently linked to the polypeptide, ii) R₁ is hydrogen, a substituted or non-substituted aliphatic group, or a substituted or non-substituted aryl group, iii) X, Y, W, and Z are hydrogen and/or non-hydrogen substituents selected from the group consisting of alkyl, heteroatom-comprising alkyl, alkenyl, heteroatom-comprising alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl, heteroatom-comprising aryl, alkoxy, heteroatom-comprising alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH, —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′, —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and —S—C≡N, wherein each R′ is independently H, alkyl, or substituted alkyl, iv) n is 2 or 3; and v) L is a linker or a linker group selected from the group consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄ substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl, C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄ substituted aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄ alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—, —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1, 2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—, —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═, —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group; and c) allowing the polypeptide to react with the chemical reagent so that a covalent linkage between the reagent and the polypeptide is formed.
 2. The method of claim 1, wherein R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.
 3. The method of claim 1, wherein: R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, and norbornadiene groups, and each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group.
 4. The method of claim 1, wherein R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.
 5. The method of claim 1, wherein R is biotin, a biotin analogue, a poly(ethyleneglycol) molecule, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15.
 6. The method of claim 1, wherein R is a resin, a nanoparticle, a functionalized surface, or a microarray.
 7. The method of claim 1, wherein the intein is a naturally occurring intein, an engineered variant of a naturally occurring intein, a fusion of the N-terminal and C-terminal fragments of a naturally occurring split intein, or a fusion of the N-terminal and C-terminal fragments of an artificial split intein.
 8. The method of claim 1, wherein the intein is a polypeptide of SEQ ID NO:1-76, or an engineered variant thereof.
 9. The method of claim 8, wherein: the C-terminal terminal asparagine, aspartic acid, or glutamine residue in the intein is mutated to an amino acid other than asparagine, aspartic acid, or glutamine, or the N-terminal serine is mutated to a cysteine residue and the C-terminal asparagine, aspartic acid, or glutamine residue in the intein is mutated to an amino acid other than asparagine, aspartic acid, or glutamine.
 10. The method of claim 9, wherein the intein is C-terminally fused to a polypeptide affinity tag selected from the group consisting of polyhistidine tag, Avi-Tag, FLAG tag, Strep-tag II, c-myc tag, S-Tag, calmodulin-binding peptide, streptavidin-binding peptide, chitin-binding domain, glutathione S-transferase, and maltose-binding protein.
 11. The method of claim 1, wherein the polypeptide C-terminally fused to the intein comprises one or a plurality of the features selected from the group consisting of: the residue at position 1 prior to the intein (hereinafter “intein-1” or “I-1”) being F, Y, A, T, W, N, R or Q; the residue at position 2 prior to the intein (hereinafter “intein-2” or “I-2”) being G, P, or S; and the residue at position 3 prior to the intein (hereinafter “intein-3” or “I-3”) being G or S.
 12. The method of claim 1, wherein the intein-fused polypeptide is inside a cell or associated with the exterior surface of a cell membrane.
 13. The method of claim 12, wherein the cell is a prokaryotic or eukaryotic cell.
 14. The method of claim 1, wherein: R₁, X, Y, and Z are hydrogen atoms, L is selected from the group consisting of —C(O)NR′—, —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂—CH₂—O)n-, R′ is a hydrogen, alkyl or aryl group, and n is an integer number from 1 to
 15. 15. The method of claim 14, wherein R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.
 16. The method of claim 1, wherein the reagent is: a) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂ or —N₃, and L is a single bond; b) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂, and L is a linker or linker group of formula

c) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and L is —C(O)NHCH₂C(O)—; or d) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is biotin, and L is —C(O)NH(CH₂)₃NH—.
 17. A kit for forming a covalent linkage between a polypeptide and a chemical species, the kit comprising: a) at least one chemical reagent of formula (I), (II), (III), (IV), (V), (VI), (VII), or (VIII), or a salt of the reagent; and b) one or a plurality of containers, wherein at least one container comprises a pre-selected or desired amount of at least one of the chemical reagents of formula (I), (II), (III), (IV), (V), (VI), (VII), or (VIII), or a salt of the reagent, wherein: i) R is the chemical species which is to be covalently linked to the polypeptide, ii) R₁ is hydrogen, a substituted or non-substituted aliphatic group, or a substituted or non-substituted aryl group, iii) X, Y, W, and Z are hydrogen and/or non-hydrogen substituents selected from the group consisting of alkyl, heteroatom-comprising alkyl, alkenyl, heteroatom-comprising alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl, heteroatom-comprising aryl, alkoxy, heteroatom-comprising alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH, —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′, —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and —S—C≡N, wherein each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group, iv) n is 2 or 3, and v) L is a linker or a linker group selected from the group consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄ substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl, C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄ substituted aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄ alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—, —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1, 2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—, —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═, —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and —C(R′)₂—N(R′)—N(R′)—, wherein each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group.
 18. The kit of claim 17, wherein R is a functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, an isotopic label molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, or a quantum dot, or any combination thereof.
 19. The kit of claim 17, wherein R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, norbornadiene groups, wherein each R is independently H, aliphatic, substituted aliphatic, aryl, or substituted aryl group.
 20. The kit of claims 17, wherein R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and a oxazine derivative.
 21. The kit of claim 17, wherein R is biotin, a biotin analogue, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15.
 22. The kit of claim 17, wherein the at least one reagent comprises at least one compound selected from the group consisting of: a) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂ or —N₃, and L is a single bond: b) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂, and L is a linker or linker group of formula

c) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and L is —C(O)NHCH₂C(O)—; or d) a compound of formula (I), wherein: R₁, X, Y, and Z are hydrogen atoms, R is biotin, and L is —C(O)NH(CH₂)₃NH—.
 23. The kit of claim 17 further comprising a functionalized solid support with which the functional group R reacts.
 24. The kit of claim 18, wherein the solid support is a resin, a nanoparticle, a surface, or a microarray.
 25. A compound having the formula (I), (II), (III), (IV), (V), (VI), (VII) or (VIII):

or a salt thereof, wherein: i) R is a bioorthogonal functional group, a label molecule, a tag molecule, an affinity label molecule, a photoaffinity label, a dye, a chromophore, a fluorescent molecule, a phosphorescent molecule, a chemiluminescent molecule, an energy transfer agent, a photocrosslinker molecule, a redox-active molecule, a spin label molecule, a metal chelator, a metal-comprising moiety, a heavy atom-comprising-moiety, a radioactive moiety, a contrast agent molecule, a MRI contrast agent, an isotopically labeled molecule, a PET agent, a photocaged moiety, a photoisomerizable moiety, a chemically cleavable group, a photocleavable group, an electron dense group, a magnetic group, an amino acid, a polypeptide, an antibody or antibody fragment, a carbohydrate, a monosaccharide, a polysaccharide, a nucleotide, a nucleoside, a DNA, a RNA, a siRNA, a polynucleotide, an antisense polynucleotide, a peptide nucleic acid (PNA), a fatty acid, a lipid, a cofactor, biotin, a biotin analogue, a biomaterial, a polymer, a water-soluble polymer, a polyethylene glycol derivative, a water-soluble dendrimer, a cyclodextrin, a small molecule, a protein-, nucleic acid-, or receptor-binding molecule, a biologically active molecule, a drug or drug candidate, a cytotoxic molecule, a solid support, a surface, a resin, a nanoparticle, a quantum dot, or any combination thereof, ii) R₁ is hydrogen, a substituted or non-substituted aliphatic group, or a substituted or non-substituted aryl group, iii) X, Y, W, and Z are hydrogen and/or non-hydrogen substituents selected from the group consisting of alkyl, heteroatom-comprising alkyl, alkenyl, heteroatom-comprising alkenyl, alkynyl, heteroatom-comprising alkynyl, aryl, heteroatom-comprising aryl, alkoxy, heteroatom-comprising alkoxy, aryloxy, heteroatom-comprising aryloxy, halo, —OH, —OR′, —SR′, —COOH, —COOR′, —CONR′₂, —NR′₂, —NO₂, —SO₃R′, —SO₂NR₂′, —C≡N, —O—C≡N, —P(O)_(k)R′ where k is 2 or 3, and —S—C≡N, wherein each R′ is independently H, alkyl, or substituted alkyl, iv) n is 2 or 3; and v) L is a linker or a linker group selected from the group consisting of a single bond, C₁-C₂₄ alkyl, C₁-C₂₄ substituted alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₁-C₂₄ substituted heteroatom-comprising alkyl, C₂-C₂₄ alkenyl, C₂-C₂₄ substituted alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ substituted heteroatom-comprising alkenyl, C₂-C₂₄ alkynyl, C₂-C₂₄ substituted alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₂-C₂₄ substituted heteroatom-comprising alkynyl, C₅-C₂₄ aryl, C₅-C₂₄ substituted aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₅-C₂₄ substituted heteroatom-comprising aryl, C₁-C₂₄ alkoxy, C₅-C₂₄ aryloxy, —O—, —S—, —NR′—, —C(O)—, —C(S)—, —C(O)NR′—, —C(S)NR′—, —N(R′)C(O)—, —S(O)_(k)— where k is 1, 2, or 3, —S(O)_(k)N(R′)—, —N(R′)C(O)N(R′)—, —N(R′)C(S)N(R′)—, —N(R′)S(O)_(k)N(R′)—, —N(R′)—N═, —C(R′)═N—, —C(R′)═N—N(R′)—, —C(R′)═N—N═, —C(R′)₂—N═N—, and —C(R′)₂—N(R′)—N(R′)—, where each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group, the compound being reactive with a polypeptide, wherein the polypeptide comprises a thioester group and/or wherein the polypeptide is C-terminally fused to an intein, and wherein reaction of the compound with the polypeptide forms a covalent linkage between the compound and the polypeptide.
 26. The compound of claim 25, wherein: R is a bioorthogonal functional group selected from the group consisting of —NR′NR′₂, —C(O)NR′NR′₂, —ONH₂, —N₃, —C≡CR′, —CR′═CR′₂, —PR′₂, 2-cyanobenzothiazole, tetrazole, tetrazine, aziridine, dihydroazirine, and norbornadiene groups, and each R′ is independently an H, an aliphatic, a substituted aliphatic, an aryl, or a substituted aryl group.
 27. The compound of claim 25, wherein R is a fluorescent molecule selected from the group consisting of a coumarin derivative, a naphthalene derivative, a pyrene derivative, a fluorescein derivative, a rhodamine derivative, a naphthoxanthene derivative, a phenanthridine derivative, a boron difluoride dipyrromethene (BODIPY) derivative, a cyanine derivatives, a phthalocyanine derivative, and an oxazine derivative.
 28. The compound of claim 25, wherein R is biotin, a biotin analogue, a poly(ethyleneglycol) molecule, or a perfluorinated alkyl chain CF₃—(CF₂)_(m)— where m=3-15.
 29. The compound of claim 25, wherein R is a resin, a nanoparticle, a functionalized surface, or a microarray.
 30. The compound of claim 25, wherein: R₁, X, Y, and Z are hydrogen atoms, L is selected from the group consisting of —C(O)NR′—, —C(O)NR′CH₂C(O)—, —C(O)NR′(CH₂)n-, and —C(O)NR′(CH₂—CH₂—O)n-, R′ is a hydrogen, alkyl or aryl group, and n is an integer number from 1 to
 15. 31. The compound of claim 25, wherein R is selected from the group consisting of biotin, a biotin analogue, and a coumarin derivative.
 32. The compound of claim 25 having formula (I), wherein: a) R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂ or —N₃, and L is a single bond; b) R₁, X, Y, and Z are hydrogen atoms, R is —ONH₂, and L is a linker or linker group of formula

c) R₁, X, Y, and Z are hydrogen atoms, R is 7-amino-4-(trifluoromethyl)-2H-chromen-2-one, and L is —C(O)NHCH₂C(O)—; or d) R₁, X, Y, and Z are hydrogen atoms, R is biotin, and L is —C(O)NH(CH₂)₃NH—. 